EsotericPapers

CONCEPT

Knowledge Compression

A knowledge lifecycle where information evolves from raw observations to canonical truth through progressive refinement. The system-level answer to how AI agents maintain coherence across unlimited sessions.

THE PROBLEM

Multi-session AI development creates a knowledge accumulation problem. Each session discovers gotchas, patterns, architectural decisions. The naive approach: append everything to a handoff document.

Result: Session 10 needs to read 100k+ tokens of unorganized discoveries to understand what happened. Knowledge becomes noise. Context windows fill with redundancy.

ACCUMULATION VS COMPRESSION

TRADITIONAL (ACCUMULATION)

Session 01:10 discoveries
Session 02:15 discoveries
Session 03:12 discoveries

37 discoveries

Redundant, unorganized, growing

KNOWLEDGE COMPRESSION

Session 01:Creates 10
Session 02:Refines → 12 better
Session 03:Refines → 10 essential

10 refined discoveries

Organized, essential, improved

Lossless compression: Session 10 doesn't read 100k+ tokens of onboarding. It gets distilled wisdom from all previous sessions.

THE KNOWLEDGE LIFECYCLE

Knowledge isn't static. It evolves through stages of increasing context, synthesis, and trustworthiness. The Epistemic Maturity Model describes this progression from raw data to canonical truth.

EPISTEMIC MATURITY MODEL

Knowledge evolves through progressive refinement

1

DATA

Raw logs, transcripts, signals

TRUST

Low

2

OBSERVATIONS

Noticed patterns, anomalies

TRUST

Low-Med

3

FIELD NOTES

Contextual documentation

TRUST

Medium

4

INSIGHTS

Synthesized understanding

TRUST

High

5

CANON

Vetted, authoritative truth

TRUST

Very High

Data → Observations → Field Notes → Insights → Canon

Movement between levels implies revision or promotion based on new evidence, testing, or consensus. Knowledge can be demoted (canon invalidated by new data) or promoted (observations refined into insights). The lifecycle is bidirectional.

MATERIALIZED IN FOLDER STRUCTURE

In Agent Focus, the knowledge lifecycle isn't just a concept - it's implemented through the folder hierarchy. Each folder represents a different maturity level.

FOLDER

EMM LEVEL

CONTENTS

docs/
CANON

specification.md, architecture decisions

knowledge/
INSIGHTS

handoff.md, codemap.md, tribal wisdom

unstructured/
FIELD NOTES

scratchpad.md, working notes

sessions/
OBSERVATIONS

*.work.log, session transcripts

At handoff, agents promote knowledge upward: working notes become refined insights, insights that prove stable graduate to canonical documentation. The folder structure IS the knowledge lifecycle.

PROGRESSIVE DISCLOSURE

Agents don't read everything. They start with the most refined layer (Insights) and only reach down to rawer data when needed. This keeps context lean while preserving access to full history.

LEVEL 1~40k

Always load

knowledge/handoff.md, codemap.md

LEVEL 2+~10k

Load if confused

sessions/NN-agent.md (previous session)

LEVEL 3Full context

Wake if needed

Past agent with full 150k context

QUERYABLE CANON: SIPS

At the Canon level, the Specification & Implementation Planning System (SIPS) turns Markdown into a structured, queryable document. Catalog-numbered items (1.1.1, 2.3.4) enable targeted reads and updates.

# Query a specific spec item

$ af spec 2.1.3

# Returns just that section, not the whole doc

→ 2.1.3 Authentication Flow: JWT tokens with...

Large specifications become navigable. Agents can grep for catalog numbers, make targeted edits, and reference specific sections without loading entire documents. Markdown becomes a pseudo-database.

HOW REFINEMENT WORKS

Refinement, Not Appending

At handoff, Session 02 doesn't just append to Session 01's handoff.md. It reads Session 01's discoveries, merges related entries, enhances existing ones with new details, removes outdated information, reorganizes for clarity. The result is BETTER documentation, not longer.

Lossless Compression

No information is lost. Session 02 enhances "JWT library fails with RS256" by adding performance data ("~5ms per token"). Session 03 adds where the decision was made ("lib/auth/jwt.ts line 3"). Each session improves the entry without losing the original insight.

Quality Compounds

By Session 10, handoff.md contains the essential wisdom from all previous sessions, organized clearly, enhanced with context, free of redundancy. It's not 10× longer. It's 10× better.

WHY IT MATTERS

This is core to Agent Focus. Without the knowledge lifecycle, multi-session development becomes impossible. Context windows fill with noise. Quality degrades.

The combination - EMM levels materialized in folders, SIPS for queryable canon, progressive disclosure for efficient context loading - creates a system where knowledge improves over time instead of accumulating into entropy.

Compression is intelligence. It's how quality survives across unlimited sessions.