- Context compaction — automatically summarizes conversation history when the context window fills up, preserving recent turns and discarding older ones.
- Semantic memory — indexes discarded messages so the agent can retrieve past context on demand via the
memory_searchtool.
What this guide is for
Use this guide when you want to understand or configure:- long-horizon conversation behavior
- semantic recall via
memory_search - compaction thresholds
- the relationship between compaction and semantic memory
Feature flags
Compaction and semantic memory have separate build-time and runtime switches.| Feature flag | Cargo feature | What it enables |
|---|---|---|
| Compaction strategy | session-compaction | DefaultCompactor in meerkat-session |
| Memory store backend | memory-store | CLI convenience feature enabling the memory store backend and session wiring |
| Memory session wiring | memory-store-session | HnswMemoryStore, memory_search, and memory integration with the agent/session path |
The CLI’s default features include the memory backend/session wiring, but not
session-compaction. For no-default/custom source builds, enable the memory features explicitly when you need memory_search.AgentFactory::memory(true) enables the semantic memory path (HnswMemoryStore + memory_search). Compaction is controlled independently by whether the binary is built with session-compaction and whether a compactor is wired into the session path.
- compaction enabled, semantic memory disabled
- semantic memory enabled, compaction unavailable
- both enabled together
CompactionConfig
Controls when and how compaction runs.| Field | Type | Default | Description |
|---|---|---|---|
auto_compact_threshold | u64 | 100_000 | Compaction triggers when last_input_tokens >= threshold OR estimated_history_tokens >= threshold. |
recent_turn_budget | usize | 4 | Number of recent complete turns to retain after compaction. A turn is a User message followed by all subsequent non-User messages until the next User message. |
max_summary_tokens | u32 | 4096 | Maximum tokens the LLM may produce for the compaction summary. |
min_turns_between_compactions | u32 | 3 | Minimum turns that must elapse between consecutive compactions (loop guard to prevent runaway compaction). |
How compaction triggers
Compaction is checked at every turn boundary, just before the next LLM call. The decision flow:Loop guard check
If compaction occurred at turn N, no compaction until turn N +
min_turns_between_compactions.What happens during compaction
When compaction triggers:Send compaction prompt to LLM
The current conversation history plus a compaction prompt is sent to the LLM with no tools and
max_summary_tokens as the response limit.Handle result
On failure: a CompactionFailed event is emitted and the session is not mutated (safe failure).On success:
DefaultCompactor::rebuild_history produces new messages:- System prompt is preserved verbatim (if present).
- A summary message is injected as a User message with the prefix
[Context compacted]. - The last
recent_turn_budgetcomplete turns are retained. - All other messages become
discarded.
Index discarded memory
If semantic memory is enabled, discarded messages are indexed before the compacted history is committed.
Replace session messages
The session messages are replaced with the rebuilt history after memory indexing accepts the discarded content.
The compaction prompt
The compaction prompt
The compactor sends this prompt to the LLM:
You are performing a CONTEXT COMPACTION. Your job is to create a handoff summary so work can continue seamlessly. Include:Be concise and structured. Prioritize information the next context needs to act, not narrate.
- Current progress and key decisions made
- Important context, constraints, or user preferences discovered
- What remains to be done (clear next steps)
- Any critical data, file paths, examples, or references needed to continue
- Tool call patterns that worked or failed
Memory indexing after compaction
When both aCompactor and a MemoryStore are wired into the agent, discarded messages are indexed into semantic memory before compacted history is committed. If the memory store rejects indexing, Meerkat preserves the original history, emits a CompactionFailed event, and skips that compaction attempt instead of dropping the only authoritative copy of the discarded text.
For each discarded message:
- The message’s indexable text content is extracted via
message.as_indexable_text(). - If non-empty, it is indexed with
MemoryMetadatacontaining the session ID, the typed source handle (the offset range of the source message(s)), and a timestamp.
memory_search tool.
The memory_search tool
When memory is enabled, the agent gains a memory_search tool.
Tool definition
| Property | Value |
|---|---|
| Name | memory_search |
| Description | Search semantic memory for past conversation content. Memory contains text from earlier conversation turns that were compacted away to save context space. Use this to recall information from earlier in the conversation or from previous sessions. |
Parameters
Natural language search query describing what you want to recall.
Maximum number of results to return. Capped at 20.
Response format
Returns a JSON array of result objects:The text content of the memory entry.
Similarity score from 0.0 (no match) to 1.0 (exact match). Typical useful matches are above 0.7.
The half-open
[start, end) offset range of the source message(s) the entry
was indexed from. Memory is scoped to a single session; results do not carry a
session_id, and there is no cross-session recall.Memory store implementations
- HnswMemoryStore (production)
- SimpleMemoryStore (test-only)
Uses:
- hnsw_rs (v0.3) for approximate nearest-neighbor search with cosine distance.
- SQLite for persistent metadata and text storage.
{store_path}/memory/memory.sqlite3Key characteristics:- Embedding: Bag-of-words TF with hash-based dimensionality reduction (4096-dimensional vectors, L2-normalized). Each word is hashed to a bucket and its presence increments that dimension.
- Persistence: Data survives process restart. On
open(), all existing entries are re-indexed into per-session HNSW graphs from SQLite. - Scoping: One HNSW index per session owner; indexes are sized from the surviving entry count for that scope.
- Score conversion: HNSW cosine distance (0 = identical, 2 = opposite) is converted to a 0..1 similarity score:
score = 1.0 - (distance / 2.0). - Thread safety: Insertions are serialized via a
Mutexto couple point ID allocation with successful writes. The scoped-index map sits behind aRwLockfor concurrent searches. - Parameters (
HnswParamsdefaults):max_nb_connection = 16,max_layer = 16,ef_construction = 200,ef_search = 200.
How memory gets wired
When thememory-store-session feature is compiled in and memory is enabled:
- An
HnswMemoryStoreis opened at{store_path}/memory/. - The
memory_searchtool is added to the agent’s tool set. - A
DefaultCompactoris attached only ifsession-compactionis also enabled. - A built-in
memory-retrievalskill is injected into the system prompt, teaching the agent how to use memory search.
Examples
- CLI
- SDK
Custom CompactionConfig
See also
- Configuration: memory and compaction - config file settings
- Architecture - how compaction fits into the agent loop
