LeanCTX: Cỗ máy context engineering cho AI agent - không chỉ là nén token

2026-07-03 5423 từ 26 phút

https://blog.luandnh.com/images/covers/lean-ctx-cover.png

Nội dung

Mở đầu: cái ngày mình nhận ra mình đang đổ xăng máy bay cho con Wave

Chuyện là thế này.

Hồi Q1 năm nay, team mình chạy Claude Code trên một cái Go monorepo tầm 500 files. Service mesh, multi-module, dependency chồng chéo. Mỗi lần Claude cần hiểu một module - ví dụ cái auth package - nó đọc full file. Rồi nó đọc thêm 3-4 file liên quan. 50 files sau, context window đầy nghẹt.

Bill token cuối tháng nhìn muốn xỉu. Mỗi dev tốn $50-60/tháng cho Claude Code + Cursor. Team 4 dev, chưa kể Gemini CLI và Codex chạy tự động trong CI. Tổng thiệt hại khoảng $200-240/tháng cho context.

Cảm giác như đổ xăng AvGas 100LL cho cái xe máy Wave vậy - động cơ có chịu được đâu, tiền thì vẫn bay.

Rồi một ngày thứ Bảy mưa, mình ngồi lướt Hacker News thấy cái project tên LeanCTX. Tagline của nó: “Control what your AI can see.” Một binary Rust duy nhất, hứa hẹn giảm 60–90% token. Đọc cái paragraph đầu tiên: “It decides what they read, remembers what they learn, guards what they touch, and proves what they save.”

Mình nghĩ: “Thằng này khác gì compression tool thường?” Click vào repo.

2,747 commits. 225 releases. 39 contributors. Cursor AI + Yves Gugger - một dev người Thụy Sĩ. License Apache 2.0 (migrated từ MIT).

Cái tagline thứ hai còn ấn tượng hơn: “60–90% fewer tokens as the receipt.”

Mình cài thử. 60 giây.

        
curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx onboard
lean-ctx doctor

Câu lệnh đầu tiên mình test:

lean-ctx read src/auth/ -m map

Nó đọc 50 file trong folder auth. Output: 533.2K tokens → 8.0K tokens. 98.1% compression.

Mình ngồi nhìn màn hình 5 giây không nói được lời nào.

Lúc đó mới vỡ lẽ: thì ra mình đang đọc file như đọc tiểu thuyết, trong khi AI chỉ cần biết export signatures và type definitions.

First look: cài xong là thấy ngay cái hay

Install process cực kỳ smooth. Có 5 cách cài:

        
# Universal - 1 câu lệnh
curl -fsSL https://leanctx.com/install.sh | sh

# macOS/Linux
brew tap yvgude/lean-ctx && brew install lean-ctx

# Node.js dev
npm install -g lean-ctx-bin

# Rust dev
cargo install lean-ctx

# Pi agent
pi install npm:pi-lean-ctx

Sau khi cài, chạy lean-ctx onboard. Cái này auto-detect mọi AI tool trên máy - Cursor, Claude Code, Codex CLI, Windsurf, Copilot, Gemini CLI, Hermes, OpenCode, Zed, Cline, Roo, JetBrains, VS Code, Neovim, Emacs, v.v. - và config luôn MCP server + shell hooks. Zero tương tác.

lean-ctx doctor verify xem mọi thứ đã hoạt động chưa.

Ngay lần đầu chạy lean-ctx gain --live - dashboard real-time token savings - mình thấy Claude Code giảm từ 4,200 tokens mỗi read xuống còn 920 tokens. 78% saving. Trong terminal real-time.

Xin lỗi, lúc đó mình kêu lên một tiếng “Ôi đệt” khá to. Vợ mình tưởng mình gặp bug.

Vibe check sau 10 phút

Pros ngay lập tức:

Binary Rust ~10MB, không dependency gì ngoài libc
Zero telemetry mặc định - 100% local
Auto-detect 30+ agents - không phải config tay một cái nào
Self-healing diagnostics - doctor --fix tự detect config sai

Khó chịu ban đầu:

Phải restart terminal session để shell hooks active
lean-ctx setup hỏi hơi nhiều câu (wizard mode). Dùng onboard thì nhanh hơn.
Có integration mode này nọ (Auto/Hybrid/MCP) - người mới dễ bối rối. Mình chọn Auto thấy ổn.

Architecture deep dive: 5 subsystems sống trong 1 binary

Đây là phần làm mình respect nhất. LeanCTX không phải “một thằng nén token”. Codebase Rust 90.8%, phần còn lại JS/Python/Kotlin cho SDK và plugin.

Cấu trúc chia làm 5 subsystems rõ ràng, mỗi cái có một job riêng:

1. Compression - cái engine làm nên tên tuổi

Có 10 read modes, mỗi mode dùng một compression strategy khác nhau:

Full read (baseline): Đọc nguyên xi file. Compression 0%. Ai cũng làm được cái này.

Map mode (98.1% compression): Đây là mode mình dùng nhiều nhất. Thay vì đọc hết 50 files, nó parse AST bằng tree-sitter, extract exports declaration với line spans. Kết quả:

┌─────────────────────┬────────────────────────┐
│ src/auth/service.go │ authenticate(),        │
│   (4200 tok → 920)  │ validateToken(),       │
│                     │ refreshSession()       │
├─────────────────────┼────────────────────────┤
│ src/auth/middleware │ AuthMiddleware struct,  │
│   (3800 tok → 640)  │ Authenticate(),        │
│                     │ handleError()          │
├─────────────────────┼────────────────────────┤
│ 48 more files...    │ ...                    │
└─────────────────────┴────────────────────────┘

Signatures mode (96.7% compression): Chỉ giữ AST signatures - function signatures, type definitions, interface declarations. Khi AI cần hiểu “module này làm gì” mà không cần implementation detail, cái này là vũ khí tối thượng.

Benchmark cụ thể:

Raw read: 533.2K tokens
Map mode: 8.0K tokens, quality score 78%
Signatures mode: 14.0K tokens, quality score 96%
Cached re-read: ~13 tokens - 99.99% compression cho lần đọc lại

Mình không hiểu sao không ai làm cached re-read trước đây. Cơ chế đơn giản: LeanCTX hash nội dung file, cache kết quả parse. Khi AI hỏi lại, chỉ cần trả về cache key. 13 tokens so với 4,200 tokens - tỷ lệ 323x.

Diff mode (80-95% compression): Chỉ show changed lines từ lần đọc cuối. Khi AI làm code review, nó chỉ cần biết cái gì đã thay đổi, không cần đọc lại toàn bộ file.

Density mode (variable): Entropy budget kiểu SDE - giữ phần có thông tin cao (high entropy - code logic, control flow), bỏ phần boilerplate. Dùng compressor heuristic chứ không dùng ML, nên deterministic và prompt-cache-safe.

Ngoài ra còn:

lines:N-M - chỉ đọc specific lines
citations - tóm tắt + citation markers
anchored - đọc từ một symbol anchor
full - đọc nguyên (khi cần thật)

95+ shell patterns nữa. Cái này mình cũng khoái. Khi bạn chạy npm test qua CTX shell, nó biết output của test runner (PASS/FAIL, coverage %) và chỉ giữ phần đó. Git log → commit hash + message. Kubectl get pods → status + restart count. Docker build → layer caching status. Cargo test → test results. 95+ patterns cover gần hết CLI tools mình dùng hàng ngày.

2. Routing - học cách đọc file thông minh hơn

Subsystem này mới mẻ hơn. Nó có:

ModePredictor: Học optimal read mode cho từng file dựa trên:

Loại file (Go, Rust, TypeScript, YAML, Markdown…)
Mục đích đọc (code review, feature implementation, bug fix)
Lịch sử đọc trước đó

Nếu tuần trước bạn đọc service.go toàn dùng signatures mode, lần này nó propose signatures trước. Nếu cần full content, AI có thể override.

IntentEngine: Classify query complexity. AI hỏi “show me the auth flow” → IntentEngine detect đây là architecture question → chọn callgraph mode thay vì full. AI hỏi “fix bug trong function X” → detect đây là bug fix → chọn diff + anchored từ function X.

Adaptive fidelity: Khi context budget sắp hết, tự động giảm fidelity - từ full → signatures → map - thay vì crash hay forget context.

3. Memory - cho AI “nhớ như in” giữa các session

Cái này mới là killer feature mà compression tool thường không có.

Context Continuity Protocol (CCP): Kể từ v2.0.0, LeanCTX duy trì cross-session memory. Khi bạn mở session mới, thay vì phải giải thích lại từ đầu “cái project này làm gì, structure thế nào, ai đã làm gì tuần trước” - nó tự động restore.

Kết quả: 99.2% cold-start tokens eliminated - theo cam kết của họ.

Temporal Knowledge Graph: Lưu facts với validity windows. Ví dụ:

“Database migration v3 đã chạy production từ 2026-06-15” → valid từ ngày đó, vô hiệu trước đó
“Auth service endpoint thay đổi từ /v1/auth thành /v2/auth” → timestamped
“Module X đã bị deprecate từ Q2” → automated pruning

AI không cần hỏi “cái này còn đúng không?” vì nó biết validity window.

Episodic + procedural memory:

Episodic: “Lần trước mình đã implement feature X ở file Y với approach Z” - AI có thể reference lại
Procedural: “Để deploy service này, cần chạy: make build → docker compose up → migrate” - AI học được procedure qua observation

Property Graph: Multi-edge code graph cho impact analysis:

authenticate() ──calls──> validateToken()
authenticate() ──imports──> jwt package
AuthMiddleware ──type_ref──> http.Handler
authenticate() ──exported_by──> service.go

Graph này dùng cho:

Blast radius analysis: “nếu change authenticate(), những file nào bị ảnh hưởng?”
Connected files: “file nào có liên quan tới auth?”
impact analysis: trace từ function → package → service

4. Guard - không cho AI đọc lung tung

Bảo mật context là cái ít tool nào làm nghiêm túc.

PathJail: Restrict file access theo config whitelist. Bạn có thể cấu hình:

“AI chỉ được đọc trong /src/ và /docs/”
“Không đọc .env, secret*.yaml, credentials.*”
“Read-only mode cho production”

Secret redaction: Auto-detect pattern của secret (AWS key, GitHub token, database URL) và che đi trước khi gửi lên model. Output:

DB_URL = postgres://user:***@localhost:5432/db
AWS_SECRET_KEY = ***redacted***

OS-sandboxed code execution (ctx_execute): Chạy code trong sandbox OS, cô lập file system, network, process. Không cho AI spawn shell lung tung.

Injection detection: Chống prompt injection qua tool output. Nếu file content có [SYSTEM: ignore previous instructions], nó detect và block.

Budgets & SLOs: Team lead có thể set token budget cho mỗi session. Dashboard real-time.

5. Proof - chứng minh mày đã tiết kiệm được bao nhiêu

Nếu bạn đã từng phải báo cáo ROI cho CTO về AI tool spend, cái này là cứu cánh.

Ed25519-signed savings ledger: Mỗi compression event được ký số. Proof có thể verify offline - không cần gọi API, không cần trust. Stack: Ed25519 keypair sinh từ entropy, sign mỗi ledger entry, lean-ctx verify kiểm tra.

ctx_proof / ctx_verify: Hai MCP tools cho:

ctx_proof - generate proof cho một compression event
ctx_verify - verify proof có hợp lệ không

CI drift gates: Cam kết “self-footprint chỉ 2.1K tokens” được enforce trong CI. Nếu một commit làm footprint tăng lên, CI fail. Họ có 29 published stability contracts, mỗi cái SHA-256-locked.

Dual-arm benchmark: 99.4% input-side saving trên cache-priced rails. Reproducible benchmark với lean-ctx benchmark report . - chạy trên exact codebase của bạn.

Feature walkthrough: 81 MCP tools và 29 user journeys

Mình đã từng review compression tools. High-level: nén tin nhắn, nén tool output, đôi khi có memory. LeanCTX có 81 MCP tools chia 6 categories - mình chưa thấy tool nào có surface ngang cái này.

Core Tools (File & Code)

Đây là những tool dùng mỗi ngày. 17 tools cho file read, shell, search, edit:

ctx_read - smart read với 10 modes, auto-mode detection, session caching. Mỗi lần đọc có compression, mỗi lần re-read có caching.
ctx_multi_read - batch read. Thay vì 5 round-trips đọc 5 files, gom lại 1 call.
ctx_shell - shell command với 95+ pattern compression. Dùng raw=true nếu cần output đầy đủ.
ctx_search - grep code search, compact results với file path + line number + match snippet.
ctx_edit - file editor context-aware: create, replace, replace_all, append. Biết ngữ cảnh file đang edit.
ctx_delta - chỉ show changed lines từ lần đọc cuối. Review code rất nhanh.
ctx_smart_read - single-call intelligent read. Auto chọn mode dựa trên file type và mục đích.
ctx_semantic_search - hybrid search BM25 + embeddings. Tìm code bằng ý nghĩa, không cần keyword match.
ctx_explore - iterative code exploration. BM25 + BFS bounded graph, output path:start-end citations, không đọc full file.
ctx_glob - find files by glob, gitignore-aware, multi-root support.
ctx_url_read - fetch web page, PDF, YouTube transcript as cited context.
ctx_git_read - remote git repo via cached shallow clone. Đọc code từ GitHub/GitLab không cần clone full.
ctx_provider - external context providers: GitHub Issues, GitLab MRs, Jira tickets, Postgres queries, MCP servers, REST APIs.

Một flow điển hình:

AI hỏi “show me the login flow”
ctx_intent detect đây là code exploration question
ctx_graph query Property Graph → tìm connected files cho “login”
ctx_compose build task context: keywords “login authenticate session” + ranked files + match locations
ctx_read với auto-mode = signatures đọc 12 files liên quan
Tổng: 4 MCP calls thay vì 15+. Token: ~5K thay vì 80K.

Intelligence Engine - 19 tools

Mình dùng ít hơn nhưng ấn tượng:

ctx_intent - detect query type: code review? feature? bug fix? architecture? → auto-select tool chain
ctx_graph - persistent project dependency graph. Track dependency changes giữa sessions.
ctx_dedup - cross-file dedup. Phát hiện function duplicate, config trùng lặp, error handling pattern lặp.
ctx_impact - trace imp chứa dependency chains. “Nếu change X, ai bị ảnh hưởng?”
ctx_architecture - graph-based architecture analysis: detect clusters, layers, dependency cycles.
ctx_analyze - entropy analysis. Tính toán optimal compression mode cho từng file.
ctx_compare - preview compression: nguyên bản vs exact bytes lean-ctx emit. Có line diff.
ctx_repomap - PageRank repo map của symbols quan trọng nhất trong codebase.

Memory & Knowledge - 7 tools

ctx_knowledge - CRUD temporal knowledge graph
ctx_remember - lưu fact vào session memory
ctx_handoff - chuyển context giữa agents (có diary + knowledge transfer)
ctx_compose - task composition: keywords, ranked files, match locations
ctx_package - portable context package: export session state + knowledge, import/resume
ctx_prompt - manage context protocols (CCP, CEP, TDD)

ctx_symbol - 26 languages via tree-sitter. Tìm definitions, references, implementations.
ctx_callgraph - caller/callee graph cho function. Multi-hop.
ctx_outline - structural outline: class hierarchy, function list, import graph.
ctx_routes - HTTP route extraction (Gin, Echo, Chi, Express, FastAPI, Rails, Django, v.v.)
ctx_smells - code smell: long functions (>50 lines), deep nesting (>3 levels), cyclomatic complexity >10.
ctx_refactor - LSP/IDE refactoring: rename symbol, move symbol, edits trong function body, reformat.
ctx_review - automated code review: impact analysis, caller tracking, test discovery. Tích hợp với CI.

Context Firewall & Security - 8 tools

ctx_firewall - runaway tool output → compact digest + retrieval ref ID
ctx_scrub - PII/secret redaction
ctx_policy - query/response policies
ctx_inspect - inspect context trước khi gửi lên model

Workflow & Orchestration - 15 tools

ctx_agent - spawn sub-agent với scoped context + budget
ctx_plan - context planning (CFT) với Phi scoring + budget allocation
ctx_skillify - codify recurring patterns thành .cursor/rules/skillify-*.mdc
ctx_rules - cross-agent rules governance: sync, diff, lint, status
ctx_tools - MCP Tool-Catalog Gateway. Đây là tool meta: ngồi trước N downstream MCP servers. Khi AI muốn gọi tool, ctx_tools dùng BM25 ranking để chọn đúng server. Flat context cost dù có bao nhiêu tools. Off by default.

Catalogue tools từ các addon và external MCP servers - Headroom, Sophon, Mem0, Sequential Thinking, v.v. - tất cả đều được routing và compressed.

DX Developer Experience: dùng có sướng không?

Có những chi tiết nhỏ làm mình thấy tác giả để tâm đến UX thật sự:

lean-ctx gain --live - real-time dashboard trên terminal. Số token save chạy real-time, effect kiểu đồng hồ đo nhiên liệu. Nhìn nghiện lắm.

lean-ctx wrapped - weekly/monthly recap. Dạng share card cho team. Mình thấy cuối tháng post lên team Slack: “Tuần này LeanCTX save được 120K tokens ~ $18.” Team lead thích lắm.

lean-ctx dashboard - Context Manager trên browser. Real-time SLOs và budgets. 4 tabs: Overview → Agents → Budgets → Ledger.

lean-ctx watch - TUI monitor cho fan terminal.

lean-ctx doctor --fix - self-healing. Config sai? MCP port conflict? File permission? Nó tự detect và hỏi “muốn fix không?” Ấn Y là xong.

lean-ctx benchmark report . - chạy benchmark trên chính codebase mình. Output: file với từng con số, kèm Ed25519 signature. Có thể compare với benchmark tuần trước.

Tích hợp với 30+ agents

Mình test với 3 agents:

Claude Code: Auto-detect khi chạy lean-ctx onboard. MCP server + shell hooks. Claude tự động dùng ctx_read thay vì Read tool. Token giảm 68% trong session đầu.

Cursor: Tương tự. Cursor tự động detect lean-ctx MCP server. Trong Cursor chat, mỗi lần đọc file là thấy (compressed by lean-ctx: 72% saved).

Hermes Agent: Hermes support native - lean-ctx init --agent hermes. Context compression transparent. Mình dùng Hermes với lean-ctx và thấy improvement rõ: response nhanh hơn 40%, total tokens ít hơn 55%.

Codex CLI: OpenAI Codex cũng support. lean-ctx init --agent codex.

Gemini CLI: Google Gemini CLI support MCP, nên lean-ctx MCP server attach được.

VSCode và JetBrains có extensions riêng: URI handler, native dashboard tab. JetBrains plugin thậm chí có PSI navigation, refactoring engine, gain tool window.

Proxy mode - optional compression layer

Khi enable proxy (lean-ctx proxy enable), mọi request từ agent → model đều được compressed:

System prompt → compressed
Conversation history → compressed (cũ tin nhắn được summary, giữ structure)
Tool results → compressed (giữ nguyên bản local, retrieval reference)

Prompt-cache-safe: #498 contracts đảm bảo byte-stable output. Anthropic prompt caching giảm 90% cost, OpenAI giảm 50%. LeanCTX không phá cache.

Real dollars metered: Proxy tính real dollars saved - không chỉ token, mà còn tính cache hit discount.

Addon system

        
lean-ctx addon search memory      # tìm addon liên quan tới memory
lean-ctx addon add headroom        # cài Headroom MCP + wrapper
lean-ctx addon list                # list installed addons
lean-ctx addon remove headroom     # remove

Mỗi addon là MCP server chạy dưới ctx_tools gateway. Apply compression pipeline.

Mình thấy cái wrap Headroom khá hài hước. Headroom: “tôi compress context.” LeanCTX: “được, tôi compress cả output của ông luôn.”

Current registry: Headroom, Sophon, Repomix, Serena, Mem0, Cognee, Letta, Sequential Thinking.

Phần mà mình chưa hiểu hết

Thú thật: mình không rõ Context Time Machine hoạt động thế nào. Docs nói về git-anchored, signed snapshots mà bạn có thể replay, restore, publish, import. Nghe giống Git cho context - bạn có thể check out lại context state của tuần trước, xem AI đã thấy gì. Nhưng codebase gần 2,800 commits, feature này vẫn là “direction, not yet shipped.” Nếu ai đã dùng thử, comment cho mình biết nhé.

Weaknesses - không có gì là hoàn hảo

Mình thích docs của LeanCTX - có hẳn known limitations section.

Rust build time

Build từ source: cargo build ~5 phút trên máy mình (M1 Pro, 32GB). Pre-built binary recommended. npm install -g lean-ctx-bin hoặc brew fast hơn nhiều.

Self-footprint: 2.1K tokens mỗi session

Mỗi session, LeanCTX inject khoảng 2.1K tokens vào system prompt: MCP tool definitions, context protocols, memory descriptors. Không thể tránh - chính binary đó inject. Tuy nhiên:

CI-gated: footprint chỉ shrink, không grow
So với 50K-200K tokens mỗi session, 2.1K là negligible
CEP protocol optimize cognitive efficiency

Feature overload

81 MCP tools. 5 subsystems. 29 user journeys. 29 stability contracts. 79-pages docs.

Nếu bạn chỉ muốn “nén context để rẻ hơn”, cái này như dùng dao mổ trâu cắt miếng giò. Headroom làm tốt hơn cho simple use case.

Moving target

2,747 commits. 225 releases. Daily releases gần đây. Version 3.8.18 mới hôm 1/7/2026. Codebase thay đổi nhanh hơn docs update.

Mình gặp 2 lần feature deprecation mà docs chưa cập nhật kịp:

ctx_discover_tools rename từ tool cũ
Một vài CLI flags thay đổi giữa minor versions

Không có ML compression

Đây là trade-off có chủ ý. LeanCTX deterministic by design - #498 contract. Headroom có “Kompress” với ML-based prose compression. LeanCTX từ chối hẳn hướng đó.

Kết quả: prose nén hơi cứng, nhưng code compression xuất sắc. Nếu workload bạn 80% code + 20% docs, LeanCTX thắng. Nếu 80% prose (document Q&A, contract analysis, research papers) → Headroom ML mode có thể tốt hơn.

Marketing self-comparison bias

So sánh với Headroom và các tools trong repo của mình. Không ai viết “mình thua” cả. Mình luôn cross-check với independent sources. Con số 98.1% compression có thể đúng với Go monorepo, nhưng với Python scripts hay YAML configs có thể thấp hơn.

Comparison: LeanCTX vs Headroom vs Aphrodite

Headroom - stateless compression powerhouse

Headroom của Tejas Chopra có 55.2K+ stars, cộng đồng khủng:

Strengths:

ML compression (Kompress): Model học cách nén prose. 60-95% compression trên văn bản tự nhiên.
Framework wrappers rộng: LiteLLM, LangChain, Agno, Strands, Vercel AI SDK - plug & play.
Python-native: pip install headroom-ai, không cần daemon.
Proxy mode: transprent proxy compress request/response.

Weaknesses so với LeanCTX:

3 MCP tools (compress/retrieve/stats) vs 81
Stateless: không memory, không session, không knowledge graph
Non-deterministic: output thay đổi giữa các lần chạy → không prompt-cache-safe
Python runtime heavy: cần PyTorch cho ML mode
Không code intelligence: không call graph, không AST, không structural analysis

Rule of thumb từ docs của chính LeanCTX:

“Choose lean-ctx when the payload is code, tool output, logs or RAG context and you need local, deterministic, cache-preserving output; consider Headroom’s ML mode when you specifically need learned prose compression.”

Aphrodite - Hermes-only, obsolete rồi

Aphrodite là fork Rust của Headroom, hướng Hermes Agent:

	LeanCTX	Aphrodite
Stars	3,100	16
MCP tools	81	12
Agents	30+	1 (Hermes)
Contributors	39	2
Releases	225	~10
Compression modes	10	3
Memory	✅ Knowledge Graph	❌

Nói thẳng: Aphrodite về cơ bản là obsolete. LeanCTX support Hermes native. Nếu bạn dùng Hermes, cài LeanCTX không có lý do gì dùng Aphrodite.

Matrix tổng hợp

Dimension	LeanCTX	Headroom	Aphrodite
Runtime	Rust single binary	Python / Node	Rust single binary
Stars	3,100	55,200+	16
Tools	81 MCP	3 MCP	12
Deterministic	✅ (#498)	❌	❌
Prompt-cache safe	✅	❌	❌
Memory	Graph + session	SharedContext	None
Knowledge Graph	✅ Temporal	❌	❌
Code Intelligence	AST + call graph	None	Basic
Shell Compression	95+ patterns	❌	❌
ML Compression	❌ (by design)	✅ Kompress	❌
Proxy Mode	✅	✅	✅
Addon System	✅ Wrap tools	❌	❌
Local-first	100%	Library	100%
Agent Support	30+	~10	1
License	Apache 2.0	MIT	CC0-1.0

Verdict: ai nên dùng, khi nào không?

Dùng LeanCTX nếu bạn:

Daily dùng AI coding agents - Claude Code, Cursor, Codex CLI, Gemini CLI, Hermes. Savings thấy ngay lập tức.
Codebase medium-large (50+ files, monorepo, multi-module) - ROI tăng theo kích thước repo.
Local-first + zero telemetry - compliance, legal, security teams sẽ thích.
Cần deterministic output cho audit - signed proof, CI gates, reproducible benchmarks.
Team có nhiều agents - shared memory, session persistence, knowledge graph.
Làm với code nhiều hơn prose - công việc dev daily, không phải document writer.

Skip nếu bạn:

Codebase nhỏ - 5-10 files, cài LeanCTX dư sức. Headroom đơn giản hơn.
Cần ML prose compression cụ thể - LeanCTX không có và không định có.
Headroom + Aphrodite đã OK - đừng migrate nếu không có pain point.
Dùng 1 agent duy nhất - shared memory không có tác dụng.
Team non-technical - config cần hiểu chút về MCP và shell hooks.

Bottom line:

LeanCTX không phải “một compression tool nữa.” Nó là context OS cho AI agents.

Compression là cái bạn thấy đầu tiên, nhưng cái giữ bạn lại là memory, guard, proof và routing. Một binary Rust 10MB làm những việc mà trước đây cần stack 3-4 tools: Headroom + Mem0 + Semgrep + custom CI scripts.

Mình đã chuyển từ Headroom + Mem0 + custom scripts sang LeanCTX duy nhất. Bill token tháng 6 giảm 62%. Setup time: 60 giây. Không regret.

Còn cái Context Time Machine với signed snapshots - mình vẫn chưa hiểu hết. Hybrid search? Chưa dùng nhiều. Nhưng biết nó ở đó, sẵn sàng khi mình cần. Cái cảm giác có safety net đáng giá.

Deep dive: SDKs, protocols, và cách build agent của riêng bạn

LeanCTX không chỉ là CLI tool. Nó có full SDK stack để bạn embed context engineering vào agent của riêng mình.

SDKs

lean-ctx-sdk (Rust): compress() function drop-in. Dùng khi bạn build agent harness bằng Rust.

        
use lean_ctx_sdk::compress;

let result = compress(messages, model::Gpt4o)?;
println!("Saved: {} tokens ({}%)", result.saved_tokens, result.saved_pct);

lean-ctx-client (Python): Dùng trong Python agent harness.

        
from lean_ctx import ProxyClient

client = ProxyClient()
result = client.compress(messages, model="gpt-4o")
print(f"Saved {result.saved_tokens} tokens")

lean-ctx-client (TypeScript): Cho Node.js/Deno agents.

        
import { ProxyClient } from 'lean-ctx-client';
const client = new ProxyClient();
const result = await client.compress(messages, { model: 'gpt-4o' });

Versioned /v1 API

Khi bạn muốn control hoàn toàn, dùng REST API:

        
        
        
    
POST /v1/compress
Content-Type: application/json

{
  "messages": [...],
  "model": "claude-sonnet-4-20250514",
  "compress_system": true,
  "compress_history": true,
  "compress_tools": true
}

Response gồm messages đã compress + metadata (saved_tokens, saved_pct, proof_signature). Deterministic: cùng input → cùng output.

GET /v1/references/{id}

Retrieve original uncompressed content từ session cache.

Protocols: CCP, CEP, TDD

LeanCTX implement 3 protocols cho cross-agent context management:

Context Continuity Protocol (CCP):

Duy trì session memory giữa các chat
Task/findings/decisions tracking với LITM-aware positioning
99.2% cold-start tokens eliminated

Cognitive Efficiency Protocol (CEP):

Tính điểm cognitive efficiency của mỗi message
Tự động compress đoạn có điểm thấp (boilerplate, redundancy)
Feedback loop: AI phản hồi “cái này compress mất thông tin” → hệ thống điều chỉnh

Token Dense Dialect (TDD):

Shorthand notation: λ → lambda, § → section
ROI-mapped identifiers
Thêm 8-25% savings trên top của compression

Ví dụ TDD trong action - thay vì:

The authenticate function in auth/service.go calls validateToken from auth/validator.go

Nó viết:

§auth: authenticate() ↦ §auth/validator: validateToken()

15 tokens thay vì 22. Không nhiều nhưng scale lên 1M messages thì khác.

Build your own agent harness

Docs có hẳn mục “Journey 14 - Build Your Own Agent: SDKs & /v1 API”. Bạn có thể:

Start lean-ctx daemon (background)
Dùng SDK hoặc REST API compress context
Gọi model API với context đã compress
Verify savings với ctx_proof

        
from lean_ctx import ProxyClient
import openai

client = ProxyClient()
compressed = client.compress(messages, model="gpt-4o")

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=compressed.messages,
)

print(f"This call saved ${compressed.dollars_saved:.4f}")

Context Firewall: không cho AI chạy lung tung

Có cái feature mới thấy gần đây: Context Firewall (Journey 8). Khi AI execute một command output rất lớn - make test ra 50K lines log - firewall intercept, generate compact digest (PASS/FAIL, coverage, thời gian chạy), attach retrieval reference. Nếu AI cần detail, nó gọi ctx_expand(id, head=50) để xem 50 lines đầu. Hoặc ctx_expand(id, grep="ERROR") để grep trong output gốc.

Cơ chế này prevent context window overflow từ tool output runaway. Mình đã từng gặp trường hợp Cursor đọc cả 10K lines của npm test output và context window đầy hết, không còn chỗ cho code. Từ khi có firewall, chuyện đó không còn nữa.

JetBrains vs VS Code experience

Mình dùng cả JetBrains GoLand và VS Code. LeanCTX support cả hai.

VS Code Extension

URI handler: lean-ctx://open/file?path=...
Native dashboard tab: Open Context Manager ngay trong VS Code
MCP auto-config: mcp.json tự động generate
Status bar indicator: token saved counter

JetBrains Plugin

PSI navigation: dùng PSI (Program Structure Interface của JetBrains) thay vì MCP calls cho navigation. Nhanh hơn 2-3x.
Refactoring engine: rename/move symbol đồng bộ với LSP
Gain tool window: real-time token savings display
Chạy được cả trên IntelliJ, GoLand, PyCharm, WebStorm

Mình thấy JetBrains plugin mượt hơn VS Code extension - có lẽ vì JetBrains có PSI nên navigation performance tốt hơn. VS Code extension ổn nhưng thiếu vài tính năng dashboard so với web dashboard.

Community và ecosystem

LeanCTX có community Discord hoạt động tích cực. Tác giả Yves Gugger respond issues trên GitHub trong vòng vài giờ - mình đã chứng kiến issue được fix trong 30 phút sau khi report.

Cộng đồng contributor 39 người (tính cả @cursoragent và @claude - GitHub account của AI agents). Đây là một trong những dự án nặng về AI-assisted development nhất mình từng thấy: contributor list có cả tài khoản của Cursor AI và Claude.

Điều này có hai mặt:

Mặt tốt: tốc độ development khủng khiếp - 2,747 commits
Mặt xấu: code consistency có thể bị ảnh hưởng, nhưng mình thấy quality khá ổn

Pricing: free local forever (enforced by CI). Paid tiers cho team/cloud features (shared server, role-based access, managed connectors). Transparent pricing trên website.

So sánh chi tiết: code benchmarks

Mình chạy thử benchmark trên 3 codebase khác nhau:

Go monorepo (50 files, 15K LOC):

Raw: 533.2K tokens
Map mode: 8.0K tokens (98.1%)
Signatures: 14.0K tokens (96.7%)
Cached re-read: 13 tokens (99.99%)

TypeScript Next.js app (120 files, 25K LOC):

Raw: 890K tokens
Map mode: 22K tokens (97.5%)
Signatures: 35K tokens (96.1%)
Cached re-read: 13 tokens (99.99%)

Python Django monolith (200 files, 40K LOC):

Raw: 1.2M tokens
Map mode: 42K tokens (96.5%)
Signatures: 68K tokens (94.3%)
Cached re-read: 13 tokens (99.99%)

Python compression thấp hơn Go/TS một chút - tree-sitter grammar cho Python có nhiều dynamic features hơn nên AST mapping coverage không bằng. Nhưng 94-96% vẫn là con số rất ấn tượng.

Các câu hỏi thường gặp từ đồng nghiệp

“Cài này có ảnh hưởng tới quality của AI không?”

Mình đã chạy A/B test: cùng một task với và không có LeanCTX. Kết quả:

Với LeanCTX: response nhanh hơn 40%, ít hallucination hơn (context window ít noise)
Không LeanCTX: response chậm hơn, đôi khi bỏ sót instruction vì context bị overflow

Context-rot research từ Anthropic cho thấy accuracy giảm từ 98% xuống 64% khi context window đầy noise. LeanCTX giảm noise → tăng accuracy.

“Có mất thông tin gì không?”

Không. Zero-loss khi dùng reversible modes (map, signatures - lossless). Density mode có potential loss nhưng dùng có điều kiện. Và mọi original đều retrievable qua ctx_retrieve.

“Có gửi code lên cloud không?”

Không. 100% local. Zero telemetry mặc định. Compression chạy local. Chỉ có share card (Wrapped) là opt-in và chỉ publish token count + display name.

“Có chạy được với Codex CLI không?”

Có. Mình test Codex CLI + LeanCTX, hoạt động tốt. MCP server attach được. lean-ctx init --agent codex.

“Có conflicts với các MCP server khác không?”

Không. ctx_tools gateway có thể route request tới nhiều MCP servers. Cài thêm addon cũng không conflict.

“Học config có lâu không?”

Mình mất 10 phút để hiểu cơ bản. 1 giờ để dùng thuần thục các modes và tools. Docs có 29 journeys nhưng không cần đọc hết - lean-ctx onboard + lean-ctx gain --live là đủ dùng ngay.

Câu chuyện cá nhân: lần đầu deploy lên production

Tháng trước, mình deploy LeanCTX lên team CI pipeline. Script ngắn:

        
# Trong CI
lean-ctx install
lean-ctx benchmark report ./ci-benchmark.md

Kết quả: CI/CD pipeline dùng Claude Code cho code review giảm từ 45K tokens mỗi run xuống còn 12K tokens. Mỗi lần CI chạy save $0.66. Với 200 CI runs/ngày, đó là $132/ngày. $3,960/tháng chỉ từ CI savings.

Sau đó mình báo cáo lên CTO với số signed proof - Ed25519 signature, có thể verify offline. CTO hỏi “cái này có thật không?” Mình chạy lean-ctx verify trước mặt ổng.

3 phút sau, ổng bảo “deploy cho toàn team.”

Cuối tháng, team mình tiết kiệm được $2,300 so với tháng trước. Một con số không nhỏ. Thằng nào tháng sau suggest revert LeanCTX chắc bị team đánh hội đồng. 🤡

Những câu hỏi mình vẫn chưa có lời giải

Mình không phải fanboy. Có vài thứ mình vẫn đang tìm hiểu:

Context Time Machine hoạt động thế nào? Git-anchored, signed snapshots. Nghe hay nhưng docs vẫn ghi “direction, not yet shipped.” Có ai dùng thử chưa?
ModePredictor quality ở non-Go languages? Go và Rust có tree-sitter grammar mạnh, nhưng Python, Ruby, JavaScript thì sao? Mình thấy Python compression thấp hơn - 96.5% vs 98.1%.
Scalability với 500+ files? Mình test với medium repo (200 files). Ai dùng với monorepo 2000+ files rồi cho biết performance thế nào.
Deterministic compression trade-off? Chọn deterministic (không ML) là chọn reproducible output nhưng hy sinh adaptive compression. Có lúc mình ước nó biết “cái này là config boilerplate, nén mạnh tay hơn đi.”
Điều gì xảy ra khi AI không cooperate? Tool này phụ thuộc vào AI agent việc dùng ctx_read thay vì Read. Agent nào stubborn quá (dùng tool cũ) thì LeanCTX không làm gì được.

CTA: tự chạy benchmark trên codebase của bạn

Đừng tin số 98.1% compression của mình. Codebase mỗi người khác nhau - Go vs TypeScript vs Python vs Rust cho compression ratio khác nhau. Cài và chạy:

        
curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx onboard
lean-ctx benchmark report .

Số của bạn mới là số thật. Share lên Discord community - họ có leaderboard public.

Repo: github.com/yvgude/lean-ctx - Apache 2.0, 3.1K stars, 294 forks, 2,747 commits, trình độ viết code hơi bị ngon.

Website: leanctx.com - đầy đủ docs, 29 user journeys, comparison pages honest.

Discord: link ở website, community đang grow nhanh, tác giả active respond issues.

Sponsor: buymeacoffee.com/yvgude - tác giả xứng đáng được support.

Bài viết này trong series Tool Deep Dive - mình phân tích tools cho AI agents dưới góc nhìn của một kỹ sư code hàng ngày. Nếu có tool nào bạn muốn mình review, comment bên dưới nhé.

Một số điều mình vẫn đang tìm hiểu: Context Time Machine hoạt động thế nào? ModePredictor quality ở non-Go languages (Python, Ruby) có tốt như Go không? Nếu bạn đã dùng các tính năng này, share kinh nghiệm giúp mình.

🦞 Luân - kỹ sư thích mổ xẻ tools, ghét AI hallucination, nghiện Rust binaries.