Caveman Code · coming soon 37,500 ★ across the stack

The token-efficient stack
for agent-native development.

[thesis] why many token when few do trick.

Caveman is a three-part ecosystem for builders who treat tokens as a resource worth designing around — a compression primitive, a spec-driven workflow, and a coding agent CLI that stacks both.

Caveman Code · shipping soon preview → Explore the ecosystem

ECOSYSTEM · INPUT → OUTPUT raw compressed

input your prompt & context

21,340 tok · 100%

ingest

01 · primitive caveman compression skill

~75% reduction

feeds

02 · workflow cavekit spec → plan → verify

spec→plan→exec→verify

recalls via

03 · memory cavemem persistent · cross-agent

index→search→recall

stacks into

04 · flagship caveman-code agent CLI · 4× compression

L01 prompt L02 RTK L03 output L04 context

4,812 tok · 23% of baseline

§ 01 · Ecosystem

Four projects. One thesis.

Caveman starts as a small compression primitive, becomes a workflow with Cavekit, grows a memory with Cavemem, and lands as a full coding agent CLI with Caveman Code. Each layer is useful on its own. Stacked, they compound.

01 primitive

Caveman

A compression skill that shrinks token footprint by roughly 75% without losing task fidelity. Portable, composable, boring in the best way.

reduction~75%
stars37,000
licenseMIT

Read the primitive →

02 workflow

Cavekit

Spec-driven development, pushed further. Turns a written spec into a structured plan, then into verifiable execution. Opinionated, not clever.

built onCaveman
stars500
surfaceCLI + lib

See the workflow →

03 memory

Cavemem

A persistent, cross-agent memory layer. Local SQLite + FTS5 + vector search, Caveman-compressed, exposed via MCP. Your agents stop forgetting.

storelocal SQLite
protocolMCP
versionv0.1.3

Explore the memory →

04 flagship

Caveman Code

A next-gen coding agent CLI built around four independent compression layers. Fewer tokens at every hop — prompt, commands, outputs, context.

layers4×
statuspreview
runtimelocal

Open the flagship →

§ 02 · Caveman

The primitive. A skill that eats tokens for breakfast.

Caveman is a small, focused compression skill. Give it a prompt, a system message, a long-form document, a CLAUDE.md — it returns something semantically equivalent and dramatically shorter. No retraining. No proprietary runtime. Plug it wherever tokens get spent.

~75% fewer tokens on typical agent workloads.
Model-agnostic. Works upstream of whatever you're calling.
Deterministic compression, with a dictionary you control.
Composable. Pipe it. Chain it. Ignore it when you don't need it.

View on GitHub Read the spec

caveman · compress.ts

// before — 4,820 tokens
const prompt = `You are an expert software engineer.
When the user asks a question, think step
by step, consider edge cases, and then ...`;

// after — 1,204 tokens (-75%)
const prompt = caveman.compress(prompt, {
  dict: 'eng/v1',
  preserve: ['examples', 'schema']
});

// fidelity kept, bytes did not survive

before

4,820 tok

⟶

after

1,204 tok

§ 03 · Cavekit

Spec-driven development, taken seriously.

Cavekit is the workflow layer. Write a spec in prose, let Cavekit turn it into a structured plan, then drive execution against it. It uses Caveman internally so the plan and the context both stay lean.

Specs, not vibes. Every change is anchored to a written goal.
Plan before code. Structured tasks with acceptance criteria.
Verifiable. Each task has a check. Nothing ships on a hunch.
Iterative. Specs evolve with the codebase, not against it.

Install Cavekit See a sample spec

SPEC-024 payments · refund flow ready

goal Users can refund a completed order within 30 days of purchase.

scope API + admin UI. Excludes partial refunds for v1.

plan

✓ migration: add refunds table
✓ endpoint: POST /orders/:id/refund
● admin: refund button + confirm modal
○ webhook: refund.processed
○ tests: edge cases + denial paths

verify 3/5 tasks · 62% · last build green

§ 04 · Cavemem

Persistent memory. Cross-agent. Local-first.

Cavemem gives coding agents a memory that survives the session. It captures observations, stores them compressed in a local SQLite index, and serves them back through MCP — so Claude Code, Cursor, Codex, or Gemini can all recall what the last agent learned.

[thesis] why agent forget when agent can remember.

Persistent across sessions. Stop re-explaining the codebase every turn.
Cross-IDE. One memory, many agents. MCP tools: search, timeline, get_observations.
Compressed. ~75% smaller via Caveman before it ever hits disk.
Local & private. SQLite + FTS5 + vector. Nothing leaves the box. <private> tags get stripped.

View on GitHub How it composes

MEM · RECALL cavemem · search("refund flow") indexed

query cavemem.search({ q: "refund flow", k: 4 })

hits

✓ SPEC-024 refund flow · 2d ago · 0.94
✓ orders.test.ts edge cases · 3d ago · 0.81
● CHANGELOG v0.8 refunds · 11d ago · 0.62
○ chat stripe webhook · 3w ago · 0.55

store ~/.cavemem/db.sqlite · 4,812 obs · 1.2MB

mcp search · timeline · get_observations · viewer at :37777

§ 05 · Caveman Code · flagship

A coding agent CLI
with four compression layers.

shipping soon Not yet installable. The design is locked, the layers are wired, and the install commands below are a preview of what lands when it ships.

Most coding agents pay the token tax everywhere: bloated prompts, verbose tool calls, chatty outputs, sprawling context files. Caveman Code squeezes each of those independently, then stacks them. The result is a CLI that feels fast, cheap, and deliberate.

L01

Prompt input compression

User prompts are normalized and shrunk before they ever hit the model. Same intent, less surface area.

reducesuser → model prompt

input

compressed

L02

RTK command compression

Tool and command calls are routed through a Reduced Token Kernel — a compact grammar for frequently-used actions.

reducestool call payloads

raw call

rtk

L03

Output compression via Caveman

Model outputs — plans, diffs, explanations — pass through the Caveman primitive before rendering or being fed back in.

reducesassistant outputs

output

compressed

L04

Context & CLAUDE.md compression

Long-form agent context — instructions, repo maps, style guides — is compiled once and cached as a dense Caveman artifact.

reduceslong-lived context

CLAUDE.md

cached

~/projects/acme · caveman-code preview · coming soon

$ caveman-code "add a refund endpoint per SPEC-024"

▸ load context   CLAUDE.md → 18.4k → 3.9k (L04)
▸ compile prompt user → L01 → 0.3k
▸ plan           5 tasks · 62% covered by spec
▸ edit           apps/api/orders/refund.ts
▸ run            pnpm test --filter orders   ✓ 41 passing

→ done in 00:47 · 4,812 tok (baseline: 21,340)
   saved 77% · fewer words, same work.

§ 06 · Why this matters

Tokens are a resource. Most stacks spend them like tap water.

The problem

01Prompts bloat. System messages grow by accretion, never by design.
02Context files sprawl. CLAUDE.md, style guides, repo maps — loaded every turn.
03Outputs are verbose. Every hop repeats the preamble.
04Tool calls are chatty. JSON for a one-word command.
05Cost compounds. Latency and bills scale with the waste, not the work.

The thesis

If tokens were free, bloat would be free. They aren't. Treat each token as a unit of intent and most systems reveal themselves as 75% noise.

Caveman is a bet that the right answer isn't a bigger context window — it's a smaller, sharper one. Compression isn't just a cost story. It's a control story: precision, speed, and a model that thinks inside a tighter frame tends to think better.

[aside] we said it before. why many token when few do trick.

§ 07 · Architecture

How the ecosystem composes.

Each project is independently useful. Caveman Code is what you get when you wire them together on purpose.

app

workflow

primitive

model

caveman (standalone)

—

caveman · compress()

any LLM

cavekit (spec workflow)

—

spec → plan → verify

caveman

any LLM

cavemem (memory)

—

SQLite · FTS5 · vector · MCP

caveman (store-time)

any MCP client

caveman-code (flagship)

agent CLI · 4× compression

cavekit + cavemem embedded

caveman ×2 (L03, L04)

model of your choice

[note] You can adopt one layer at a time. Start with Caveman, graduate to Cavekit, bolt on Cavemem when your agents start forgetting, and move into Caveman Code when you're tired of paying the token tax.

§ 08 · Built in the open

Proof, not press.

37,000 ★ caveman the original primitive

500 ★ cavekit early but growing

v0.1.3 cavemem memory layer · april 2026

soon caveman-code shipping · april 2026

MIT license across the stack

JuliusBrussee / caveman ★ 37,000

Token compression primitive. Model-agnostic. Deterministic.

TypeScript MIT updated 2d ago

JuliusBrussee / cavekit ★ 500

Spec-driven development workflow, built on Caveman.

TypeScript MIT updated 6h ago

JuliusBrussee / cavemem ★ 7

Cross-agent persistent memory. SQLite + FTS5 + vector, exposed via MCP.

TypeScript MIT v0.1.3 · april 2026

JuliusBrussee / caveman-code coming soon

Coding agent CLI with four layers of token compression.

Rust + TS MIT ships april 2026

Fewer words.
Same work.

Caveman, Cavekit, and Cavemem are public today — read the source, send a patch. Caveman Code, the flagship CLI, ships soon. Watch the repo to be first on install day.

★ Watch Caveman Code Browse the repos

install · locked until ship day $ curl -fsSL ████████████ | sh or $ npx ████████████████@latest init verify ████████ --version // sealed

classified · not yet released

Commands are sealed until ship day. Caveman Code is still in private development — no packages are live yet. Watch the repo to be first when it unlocks.

The token-efficient stack for agent-native development.