Memory and compaction

Two systems handle conversations that exceed model context limits:

Context compaction — automatically summarizes conversation history when the context window fills up, preserving recent turns and discarding older ones.
Semantic memory — indexes discarded messages so the agent can retrieve past context on demand via the memory_search tool.

Together they allow agents to maintain coherent multi-turn sessions that exceed any single model’s context limit.

Feature flags

Both systems require compile-time features and per-request enablement.

Feature flag	Cargo feature	What it enables
Compaction strategy	`session-compaction`	`DefaultCompactor` in `meerkat-session`
Memory store backend	`memory-store`	`meerkat-store/memory` backend
Memory + compaction wiring	`memory-store-session`	`HnswMemoryStore`, `MemorySearchDispatcher`, agent loop integration

The CLI’s default features include sub-agents and skills but not session-compaction or memory-store. You must enable them explicitly.

cargo build -p meerkat-cli --features session-compaction,memory-store

At runtime, the AgentFactory has an enable_memory flag (default: false). Per-request builds can override it via AgentBuildConfig::override_memory.

let factory = AgentFactory::new(store_path)
    .builtins(true)
    .memory(true);  // enable semantic memory + compaction

// Per-request override:
let mut build_config = AgentBuildConfig::new("claude-sonnet-4-5");
build_config.override_memory = Some(true);

CompactionConfig

Controls when and how compaction runs.

Field	Type	Default	Description
`auto_compact_threshold`	`u64`	`100_000`	Compaction triggers when `last_input_tokens >= threshold` OR `estimated_history_tokens >= threshold`.
`recent_turn_budget`	`usize`	`4`	Number of recent complete turns to retain after compaction. A turn is a User message followed by all subsequent non-User messages until the next User message.
`max_summary_tokens`	`u32`	`4096`	Maximum tokens the LLM may produce for the compaction summary.
`min_turns_between_compactions`	`u32`	`3`	Minimum turns that must elapse between consecutive compactions (loop guard to prevent runaway compaction).

How compaction triggers

Compaction is checked at every turn boundary, just before the next LLM call. The decision flow:

Skip turn 0

The first turn always skips compaction.

Loop guard check

If compaction occurred at turn N, no compaction until turn N + min_turns_between_compactions.

Dual threshold evaluation

Compaction triggers if EITHER:

last_input_tokens >= auto_compact_threshold (input tokens from the last LLM response), OR
estimated_history_tokens >= auto_compact_threshold (JSON bytes of all messages / 4).

What happens during compaction

When compaction triggers:

Emit CompactionStarted event

Emitted with input/estimated token counts and message count.

Send compaction prompt to LLM

The current conversation history plus a compaction prompt is sent to the LLM with no tools and max_summary_tokens as the response limit.

Handle result

On failure: a CompactionFailed event is emitted and the session is not mutated (safe failure).On success: DefaultCompactor::rebuild_history produces new messages:

System prompt is preserved verbatim (if present).
A summary message is injected as a User message with the prefix [Context compacted].
The last recent_turn_budget complete turns are retained.
All other messages become discarded.

Replace session messages

The session messages are replaced with the rebuilt history.

Record usage and emit completion event

Compaction usage is recorded against the session and budget. A CompactionCompleted event is emitted with summary token count and before/after message counts.

The compaction prompt

The compactor sends this prompt to the LLM:

You are performing a CONTEXT COMPACTION. Your job is to create a handoff summary so work can continue seamlessly. Include:

Current progress and key decisions made

Important context, constraints, or user preferences discovered

What remains to be done (clear next steps)

Any critical data, file paths, examples, or references needed to continue

Tool call patterns that worked or failed

Be concise and structured. Prioritize information the next context needs to act, not narrate.

Memory indexing after compaction

When both a Compactor and a MemoryStore are wired into the agent, discarded messages are indexed into semantic memory after compaction completes. This happens asynchronously (fire-and-forget via tokio::spawn). For each discarded message:

The message’s indexable text content is extracted via message.as_indexable_text().
If non-empty, it is indexed with MemoryMetadata containing the session ID, current turn number, and a timestamp.

This means previously discarded conversation content becomes searchable via the memory_search tool.

The `memory_search` tool

When memory is enabled, the agent gains a memory_search tool.

Tool definition

Property	Value
Name	`memory_search`
Description	Search semantic memory for past conversation content. Memory contains text from earlier conversation turns that were compacted away to save context space.

Parameters

query

string

required

Natural language search query describing what you want to recall.

limit

integer

default:"5"

Maximum number of results to return. Capped at 20.

Response format

Returns a JSON array of result objects:

[
  {
    "content": "The text content of the memory entry",
    "score": 0.85,
    "session_id": "019467d9-7e3a-7000-8000-000000000000",
    "turn": 3
  }
]

content

string

The text content of the memory entry.

score

number

Similarity score from 0.0 (no match) to 1.0 (exact match). Typical useful matches are above 0.7.

session_id

string

The session the memory originated from (enables cross-session recall).

turn

integer

The turn number within the session when the memory was indexed.

Memory store implementations

HnswMemoryStore (production)
SimpleMemoryStore (test-only)

Uses:

hnsw_rs (v0.3) for approximate nearest-neighbor search with cosine distance.
redb for persistent metadata and text storage.

Storage layout: .rkat/memory/memory.redbKey characteristics:

Embedding: Bag-of-words TF with hash-based dimensionality reduction (4096-dimensional vectors, L2-normalized). Each word is hashed to a bucket and its presence increments that dimension.
Persistence: Data survives process restart. On open(), all existing entries are re-indexed into the HNSW graph from redb.
Score conversion: HNSW cosine distance (0 = identical, 2 = opposite) is converted to a 0..1 similarity score: score = 1.0 - (distance / 2.0).
Thread safety: Insertions are serialized via a Mutex to couple point ID allocation with successful writes. Searches use a RwLock for concurrent reads.
Constants: MAX_NB_CONNECTION = 16, MAX_LAYER = 16, EF_CONSTRUCTION = 200, DEFAULT_MAX_ELEMENTS = 100_000.

In-memory substring matching — not suitable for production.

Stores entries in a Vec<MemoryEntry> behind a RwLock.
Search: lowercases both query and content, counts matching words, scores as matching_words / total_query_words.
No persistence — data is lost when the process exits.

How memory gets wired

When the memory-store-session feature is compiled in and memory is enabled:

An HnswMemoryStore is opened at {store_path}/memory/.
The memory_search tool is added to the agent’s tool set.
A DefaultCompactor is attached (if session-compaction is also enabled).
A built-in memory-retrieval skill is injected into the system prompt, teaching the agent how to use memory search.

Examples

# Build with memory + compaction features
cargo build -p meerkat-cli --features session-compaction,memory-store

# Run with memory enabled (if the CLI surfaces this flag)
rkat run --memory "Analyze this large codebase..."

use meerkat::{AgentFactory, AgentBuildConfig};

let factory = AgentFactory::new(".rkat/sessions")
    .project_root(".")
    .builtins(true)
    .memory(true);

let mut build_config = AgentBuildConfig::new("claude-sonnet-4-5");
// Memory + compaction are wired automatically when:
//   1. The factory has memory(true)
//   2. The binary is compiled with session-compaction + memory-store-session features

let agent = factory.build_agent(build_config, &config).await?;

Custom CompactionConfig

use meerkat_core::CompactionConfig;
use meerkat_session::DefaultCompactor;

let compactor = DefaultCompactor::new(CompactionConfig {
    auto_compact_threshold: 50_000,   // Compact earlier
    recent_turn_budget: 6,            // Keep more recent turns
    max_summary_tokens: 8192,         // Allow longer summaries
    min_turns_between_compactions: 5, // More spacing between compactions
});

Getting started

Core concepts

Guides

Examples

Memory and compaction

Feature flags

CompactionConfig

How compaction triggers

What happens during compaction

Memory indexing after compaction

The `memory_search` tool

Tool definition

Parameters

Response format

Memory store implementations

How memory gets wired

Examples

Custom CompactionConfig

See also

Getting started

Core concepts

Guides

Examples

​Feature flags

​CompactionConfig

​How compaction triggers

​What happens during compaction

​Memory indexing after compaction

​The memory_search tool

​Tool definition

​Parameters

​Response format

​Memory store implementations

​How memory gets wired

​Examples

​Custom CompactionConfig

​See also

Feature flags

CompactionConfig

How compaction triggers

What happens during compaction

Memory indexing after compaction

The `memory_search` tool

Tool definition

Parameters

Response format

Memory store implementations

How memory gets wired

Examples

Custom CompactionConfig

See also