Caveman
A compression skill that shrinks token footprint by roughly 75% without losing task fidelity. Portable, composable, boring in the best way.
- reduction~75%
- stars37,000
- licenseMIT
[thesis] why many token when few do trick.
Caveman is a three-part ecosystem for builders who treat tokens as a resource worth designing around — a compression primitive, a spec-driven workflow, and a coding agent CLI that stacks both.
Caveman starts as a small compression primitive, becomes a workflow with Cavekit, grows a memory with Cavemem, and lands as a full coding agent CLI with Caveman Code. Each layer is useful on its own. Stacked, they compound.
A compression skill that shrinks token footprint by roughly 75% without losing task fidelity. Portable, composable, boring in the best way.
Spec-driven development, pushed further. Turns a written spec into a structured plan, then into verifiable execution. Opinionated, not clever.
A persistent, cross-agent memory layer. Local SQLite + FTS5 + vector search, Caveman-compressed, exposed via MCP. Your agents stop forgetting.
A next-gen coding agent CLI built around four independent compression layers. Fewer tokens at every hop — prompt, commands, outputs, context.
Caveman is a small, focused compression skill. Give it a prompt, a system message, a long-form document, a CLAUDE.md — it returns something semantically equivalent and dramatically shorter. No retraining. No proprietary runtime. Plug it wherever tokens get spent.
// before — 4,820 tokens const prompt = `You are an expert software engineer. When the user asks a question, think step by step, consider edge cases, and then ...`; // after — 1,204 tokens (-75%) const prompt = caveman.compress(prompt, { dict: 'eng/v1', preserve: ['examples', 'schema'] }); // fidelity kept, bytes did not survive
Cavekit is the workflow layer. Write a spec in prose, let Cavekit turn it into a structured plan, then drive execution against it. It uses Caveman internally so the plan and the context both stay lean.
refunds tablePOST /orders/:id/refundrefund.processedCavemem gives coding agents a memory that survives the session. It captures observations, stores them compressed in a local SQLite index, and serves them back through MCP — so Claude Code, Cursor, Codex, or Gemini can all recall what the last agent learned.
[thesis] why agent forget when agent can remember.
search, timeline, get_observations.<private> tags get stripped.SPEC-024 refund flow · 2d ago · 0.94orders.test.ts edge cases · 3d ago · 0.81CHANGELOG v0.8 refunds · 11d ago · 0.62chat stripe webhook · 3w ago · 0.55Most coding agents pay the token tax everywhere: bloated prompts, verbose tool calls, chatty outputs, sprawling context files. Caveman Code squeezes each of those independently, then stacks them. The result is a CLI that feels fast, cheap, and deliberate.
User prompts are normalized and shrunk before they ever hit the model. Same intent, less surface area.
Tool and command calls are routed through a Reduced Token Kernel — a compact grammar for frequently-used actions.
Model outputs — plans, diffs, explanations — pass through the Caveman primitive before rendering or being fed back in.
Long-form agent context — instructions, repo maps, style guides — is compiled once and cached as a dense Caveman artifact.
$ caveman-code "add a refund endpoint per SPEC-024" ▸ load context CLAUDE.md → 18.4k → 3.9k (L04) ▸ compile prompt user → L01 → 0.3k ▸ plan 5 tasks · 62% covered by spec ▸ edit apps/api/orders/refund.ts ▸ run pnpm test --filter orders ✓ 41 passing → done in 00:47 · 4,812 tok (baseline: 21,340) saved 77% · fewer words, same work.
If tokens were free, bloat would be free. They aren't. Treat each token as a unit of intent and most systems reveal themselves as 75% noise.
Caveman is a bet that the right answer isn't a bigger context window — it's a smaller, sharper one. Compression isn't just a cost story. It's a control story: precision, speed, and a model that thinks inside a tighter frame tends to think better.
[aside] we said it before. why many token when few do trick.
Each project is independently useful. Caveman Code is what you get when you wire them together on purpose.
Token compression primitive. Model-agnostic. Deterministic.
Spec-driven development workflow, built on Caveman.
Cross-agent persistent memory. SQLite + FTS5 + vector, exposed via MCP.
Coding agent CLI with four layers of token compression.
Caveman, Cavekit, and Cavemem are public today — read the source, send a patch. Caveman Code, the flagship CLI, ships soon. Watch the repo to be first on install day.
Commands are sealed until ship day. Caveman Code is still in private development — no packages are live yet. Watch the repo to be first when it unlocks.