Semantic Compression for AI Agents

Replace verbose AI-to-AI communication with structured, compressed blocks. Save 83–96% of tokens while maintaining 100% semantic fidelity.

83–96% Token Savings

130+ Messages/Chain

4 Model Families

Read Documentation View GitHub

[ARCH:PLAN]

id:api|fw:python

[BUILD:EXEC]

id:m1|target:main

[TEST:RUN]

id:t1|cmd:pytest

Why H2C Exists

⚡

Token Waste

Agent chains consume 5,000-50,000 tokens in natural language. H2C cuts this to 200-2,000.

🔗

No Structured Protocol

AI agents communicate in unstructured text. H2C provides typed, parseable blocks.

📦

Context Collapse

Natural language chains break after ~40 messages. H2C scales to 130+ messages.

🔄

Cross-Model Fragility

Prompts fail between model families. H2C works zero-shot across 4 model families.

🤝

No Versioned Handoff

Agents can't resume conversations. H2C includes cycle tracking and versioning.

🎯

Agent Orchestration

Building multi-agent systems requires custom protocols. H2C is a standard wire format.

Core Features

📝

Structured Grammar

Formal BNF grammar with typed fields, lists, and revisions. Self-describing blocks that LLMs parse natively.

🚀

Maximum Compression

Lossless semantic compression. Same information, drastically fewer tokens. Validated across 5 scenarios.

🔌

Universal Transport

Agnostic to transport: stdin/stdout, HTTP, WebSocket, MCP. Integrate with any framework.

🎛️

Context Management

PRUNE, COMPACT, FREEZE commands for handling long agent chains. Scale beyond 130 messages.

📊

Agent Orchestration

Built-in cycle tracking, retry counters, and versioned handoff. Versioning-aware agent choreography.

✅

Zero-Shot Cross-Model

Works across Claude, GPT, Gemini, and Llama without retraining. No model-specific tweaks needed.

Validated Benchmarks

Metric	Natural Language	H2C	Improvement
Architectural Plan	~800 tokens	~50 tokens	94%
Build Outcome	~200 tokens	~15 tokens	93%
3-Agent Cycle	~5,000 tokens	~200 tokens	96%
130-Message Chain	~42,000 tokens	~7,140 tokens	83%
Context Breakpoint	~40 messages	~130 messages	3.25x

Validated on Claude Opus 4.7, DeepSeek V4 Pro, GPT, Gemini, and Llama. See docs/benchmarks for methodology.

Use Cases

Multi-Agent Orchestration

Architect → Builder → Tester pipelines with retry tracking and versioned handoff.

Long-Running Chains

100+ message conversations with intelligent pruning, compaction, and freezing.

LLM-to-LLM Handoff

Structured output from Agent A → direct consumption by Agent B, no parsing overhead.

Cognitive IR

Semantic compression for retrieval-augmented generation and reasoning transport.

Agent Runtime Protocol

Standard wire format for agent hosting platforms and orchestration frameworks.

Framework Integration

Drop-in layer for LangGraph, AutoGen, CrewAI, Semantic Kernel, and MCP.

Core Syntax

Minimal H2C Example

Replace 180 tokens of natural language with just 55 structured tokens.

This minimal example shows how H2C blocks replace verbose AI communication with clean, typed fields.

[ARCH:PLAN]
id:api-weather|fw:python3.11|lib:fastapi,httpx|auth:APIKey|struct:[main.py,services/weather.py]

[BUILD:EXEC]
id:m1|target:main.py|desc:setup_fastapi_app

[BUILD:DONE]
id:m1|diff:[main.py~1]|rev:1

[ORCH:END]
final:complete|est_token:15

Natural Language ~180 tokens

H2C Blocks ~55 tokens

                            Savings
                            ~70%
                        

Real-World Examples

65% Savings

🌤️ Weather API Service

Python FastAPI service with caching, rate limiting, and multi-step build orchestration.

59% Savings

📝 TODO Console App

C# .NET 8 application with SQLite, demonstrating H2C in stateful, long-running workflows.

80% Savings

🔄 PRUNE/COMPACT Chain

Complete v1.4 workflow with context management, CTX:NEGOTIATE handshake, stress-tested to 130+ messages.

78–83% Savings

🧪 Cross-Model Stress Tests

5 complex scenarios on Opus 4.7 and DeepSeek V4 Pro, 130 messages, full benchmark suite.

Challenge

Building a Python FastAPI weather service with caching, rate limiting, and multi-step orchestration requires multiple coordination steps and verbose documentation in natural language.

H2C Solution

[ARCH:PLAN]
id:weather-api|fw:python3.11|lib:[fastapi,httpx,cachetools]
auth:APIKey::env(OPENWEATHER_API_KEY)
struct:[main.py,routers/weather.py,services/weather_service.py]
notes:[cache_TTL_10min,rate-limit_60req-min]

[BUILD:EXEC]
id:m1|target:main.py|desc:setup_fastapi_app

[BUILD:DONE]
id:m1|diff:[main.py~1]|rev:1

[TEST:RUN]
id:test_weather|cmd:pytest tests/test_weather.py

[TEST:PASS]
id:test_weather|pass_count:42

[ORCH:END]
final:complete|est_token:2450

Impact

✅ 65% token reduction vs natural language narrative
✅ Maintains all architectural metadata
✅ Machine-parseable for agent coordination
✅ Scales to multi-agent workflows

Challenge

C# .NET 8 console application with SQLite backend demonstrates how H2C handles stateful, long-running workflows with multiple state transitions and persistent queries.

H2C Solution

[ARCH:PLAN]
id:todo-app|fw:dotnet8|db:sqlite|lib:[EFCore,Spectre.Console]
struct:[Program.cs,Models/,Services/,Data/]

[CTX:UPDATE]
~progress:layer=init,status=in_progress
~next:database_setup
~active_files:[Program.cs~1]

[BUILD:EXEC]
id:b1|target:Program.cs|desc:setup_dependency_injection

[CTX:UPDATE]
~progress:layer=database,status=done
~next:crud_implementation
~active_files:[Program.cs~1,Data/TodoContext.cs~1]

[TEST:RUN]
id:t1|cmd:dotnet test TodoServiceTests.cs

[ORCH:END]
final:complete|est_token:1845

Impact

✅ 59% reduction in state documentation
✅ Clear cycle tracking for debugging
✅ Persistent state management
✅ Cost-effective for long-running apps

Challenge

Long-running agent chains accumulate context. Efficient compression while preserving semantic meaning is critical for sustained multi-agent coordination.

H2C Solution

[BUILD:DONE]
id:b1|diff:[src/main.py~1]|rev:5

[BUILD:DONE]
id:b2|diff:[src/utils.py~2]|rev:3

[CTX:PRUNE]
keep:[b1,b2]|pruned:[b1,b2]|reason:consolidate_old_builds

[BUILD:DONE]
id:b3|diff:[src/api.py~1]|rev:1

[CTX:COMPACT]
summary:[layer=3,status=done,files:[src/main.py~5,src/utils.py~3,src/api.py~1]]
keep_active:[src/api.py~1]
pruned_history:msg_2_to_5

[ORCH:END]
final:complete|est_token:7140

Impact

✅ 80% reduction in context overhead
✅ Supports 130+ message chains
✅ Lossless semantic preservation
✅ Scales to multi-week workflows

Challenge

Validate H2C v1.4 across 5 complex scenarios, 130+ messages, on Claude Opus 4.7 and DeepSeek V4 Pro with full benchmarking and cross-model compatibility verification (GPT, Gemini, Llama).

Test Results

Scenario	Messages	Savings	Models Tested
Architectural Plan	32	94%	4/4
Build Outcome	28	93%	4/4
3-Agent Cycle	45	96%	4/4
Long Chain	130	83%	3/4
Context Mgmt	67	91%	4/4

Key Findings

✅ 83-96% token savings consistently across models
✅ Zero-shot cross-model compatibility
✅ Maintains semantic fidelity at all scales
✅ Production-ready for multi-agent systems
✅ Works across Claude, GPT, Gemini, Llama families

H2C Code Examples

See how H2C blocks replace verbose natural language

ARCH ~65% savings

[ARCH:PLAN]
id:weather-api
fw:python3.11
lib:[fastapi,httpx,cachetools]
auth:APIKey::env(OPENWEATHER_API_KEY)
struct:[main.py,routers/weather.py,services/weather_service.py]
notes:[cache_TTL_10min,rate-limit_60req-min]

Architecture plan with framework, libraries, auth, and structure

BUILD ~70% savings

[BUILD:EXEC]
id:m1
target:main.py
desc:setup_fastapi_app

[BUILD:DONE]
id:m1
diff:[main.py~1]
rev:1

Build execution and completion with revision tracking

TEST ~72% savings

[TEST:RUN]
id:test_weather_endpoint
cmd:pytest tests/test_weather.py

[TEST:PASS]
id:test_weather_endpoint
pass_count:42

Test execution with results and pass count

CTX ~80% savings

[CTX:UPDATE]
~progress:layer=data,status=in_progress
~next:database_setup
~active_files:[main.py~1,models.py~1]

Context update with layer tracking and active files

CTX ~85% savings

[CTX:PRUNE]
keep:[m3,m4,t1]|pruned:[m1,m2]|reason:builds_completed

[CTX:COMPACT]
summary:[layer=api,status=done,files:[auth.py~1,routes.py~1]]
keep_active:[auth.py~1,routes.py~1]
pruned_history:msg_2_to_19

Context pruning and compaction for long-running chains

ORCH ~78% savings

[ORCH:END]
final:complete
est_token:15420
pass_count:42
fail_count:2

Orchestration completion with token estimate and counters

Natural Language vs H2C

❌ Natural Language (~180 tokens)

I've set up a new FastAPI weather service 
using Python 3.11. The service includes 
multiple endpoints for weather data fetching 
with caching (10 minute TTL) and rate limiting 
at 60 requests per minute. I've structured 
the code with separate routers and service 
layers. Authentication is handled via API key 
stored in environment variables...

✅ H2C (~55 tokens)

[ARCH:PLAN]
id:weather-api|fw:python3.11
lib:[fastapi,httpx,cachetools]
auth:APIKey::env(OPENWEATHER_API_KEY)
struct:[main.py,routers/,services/]
notes:[cache_TTL_10min,rate-limit_60req-min]

Result: ~70% token reduction while maintaining all semantic information

Project Roadmap

✓

v1.0 - Core Grammar

Foundational blocks, base syntax

Released

✓

v1.1 - Context Management

PRUNE/COMPACT, revisions, counters

Released

✓

v1.2 - State Machine

FREEZE, cycle tracking, retry logic

Released

✓

v1.3 - Formal Specification

EBNF ISO 14977, AST model, opcodes

Released

✓

v1.4 - Handshake & Error Recovery

CTX:NEGOTIATE handshake, BUILD:NACK, DAG transitive closure, formal STATE:FINDINGS

Released

→

v2.0 - Reference Implementation

Parser, validator, transpiler

Planned

🔬

v3.0 - Runtime & Compiler

Native MCP transport, agent runtime

Research

Ecosystem Integration

H2C works as the semantic layer for your favorite frameworks

🔌

MCP

Transport H2C blocks via MCP tool calls

🔀

LangGraph

H2C as node output format and state schema

🤖

AutoGen

H2C as agent response protocol

⚙️

Semantic Kernel

H2C for function result serialization

👥

CrewAI

H2C as task output format

🎯

OpenAI Agents SDK

H2C as structured output format

Frequently Asked Questions

Is H2C the same as HTTP/2 h2c? +

No. H2C is a semantic compression protocol for AI-to-AI communication, completely unrelated to the HTTP/2 cleartext upgrade mechanism defined in RFC 7540. The name stands for "Human-to-Compiler" / "Head-to-Core" — a structured format for AI agent handoff, not a network protocol. If you're looking for HTTP/2 h2c, see RFC 7540.

How much token savings does H2C provide? +

H2C delivers 83–96% token savings compared to equivalent natural language communication, validated across 5 scenarios with Claude, GPT, Gemini, and Llama. For example: a 3-agent orchestration cycle drops from ~5,000 to ~200 tokens (96% savings), and 130-message chains compress from ~42,000 to ~7,140 tokens (83% savings). Each scenario maintains 100% semantic fidelity.

Which LLMs does H2C support? +

H2C works zero-shot across all major model families: Claude (Sonnet 4.6, Opus 4.7), GPT (4, 4o), Gemini (1.5, 2.5 Pro), and Llama (3, 4). No retraining, fine-tuning, or model-specific tweaks are needed — H2C blocks are parsed natively by any LLM with an 8K+ context window. Cross-model compatibility has been validated in the benchmark suite.

Do I need any dependencies or libraries to use H2C? +

No. H2C is a plain-text protocol with a formal BNF grammar. You can use it immediately with any LLM — no libraries, no SDK, no runtime. Simply include the H2C grammar in your system prompt, and both you and the AI can start exchanging H2C blocks right away. A reference parser, validator, and transpiler are planned for v2.0.

How does H2C compare to JSON or YAML for AI agents? +

H2C is purpose-built for AI-to-AI communication, unlike JSON or YAML which are general-purpose serialization formats. Key differences: (1) H2C blocks include built-in semantics for versioning (rev:), cycle tracking (cycle:), and context management (PRUNE, COMPACT, FREEZE) that JSON/YAML lack; (2) LLMs produce valid H2C more reliably because the grammar is optimized for token prediction, not human readability; (3) H2C is designed to be self-describing and self-documenting, reducing the parsing overhead that makes JSON verbose.

Ready to Compress Your AI Workflows?

H2C is open-source (MIT), requires zero dependencies, and works with any LLM with an 8K+ context window.

Read Full Documentation Clone on GitHub