> ## Documentation Index
> Fetch the complete documentation index at: https://docs.rkat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Memory and compaction

> Automatic context compaction and semantic memory for long-running conversations

Two systems handle conversations that exceed model context limits:

1. **Context compaction** -- automatically summarizes conversation history when the context window fills up, preserving recent turns and discarding older ones.
2. **Semantic memory** -- indexes discarded messages so the agent can retrieve past context on demand via the `memory_search` tool.

Together they allow agents to maintain coherent multi-turn sessions that exceed any single model's context limit.

## What this guide is for

Use this guide when you want to understand or configure:

* long-horizon conversation behavior
* semantic recall via `memory_search`
* compaction thresholds
* the relationship between compaction and semantic memory

## Feature flags

Compaction and semantic memory have separate build-time and runtime switches.

| Feature flag          | Cargo feature          | What it enables                                                                        |
| --------------------- | ---------------------- | -------------------------------------------------------------------------------------- |
| Compaction strategy   | `session-compaction`   | `DefaultCompactor` in `meerkat-session`                                                |
| Memory store backend  | `memory-store`         | CLI convenience feature enabling the memory store backend and session wiring           |
| Memory session wiring | `memory-store-session` | `HnswMemoryStore`, `memory_search`, and memory integration with the agent/session path |

<Note>
  The CLI's default features include the memory backend/session wiring, but **not** `session-compaction`. For no-default/custom source builds, enable the memory features explicitly when you need `memory_search`.
</Note>

```bash theme={null}
cargo build -p rkat --features session-compaction,memory-store
```

At runtime, `AgentFactory::memory(true)` enables the semantic memory path (`HnswMemoryStore` + `memory_search`). Compaction is controlled independently by whether the binary is built with `session-compaction` and whether a compactor is wired into the session path.

```rust theme={null}
let factory = AgentFactory::new(store_path)
    .builtins(true)
    .memory(true);  // enable semantic memory

// Per-request override:
let mut build_config = AgentBuildConfig::new("claude-sonnet-4-6");
build_config.override_memory = ToolCategoryOverride::Enable;
```

That means these cases are all valid:

* compaction enabled, semantic memory disabled
* semantic memory enabled, compaction unavailable
* both enabled together

## CompactionConfig

Controls when and how compaction runs.

| Field                           | Type    | Default   | Description                                                                                                                                                    |
| ------------------------------- | ------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `auto_compact_threshold`        | `u64`   | `100_000` | Compaction triggers when `last_input_tokens >= threshold` OR `estimated_history_tokens >= threshold`.                                                          |
| `recent_turn_budget`            | `usize` | `4`       | Number of recent complete turns to retain after compaction. A turn is a User message followed by all subsequent non-User messages until the next User message. |
| `max_summary_tokens`            | `u32`   | `4096`    | Maximum tokens the LLM may produce for the compaction summary.                                                                                                 |
| `min_turns_between_compactions` | `u32`   | `3`       | Minimum turns that must elapse between consecutive compactions (loop guard to prevent runaway compaction).                                                     |

## How compaction triggers

Compaction is checked at every turn boundary, just before the next LLM call.

The decision flow:

<Steps>
  <Step title="Skip turn 0">
    The first turn always skips compaction.
  </Step>

  <Step title="Loop guard check">
    If compaction occurred at turn N, no compaction until turn N + `min_turns_between_compactions`.
  </Step>

  <Step title="Dual threshold evaluation">
    Compaction triggers if EITHER:

    * `last_input_tokens >= auto_compact_threshold` (input tokens from the last LLM response), OR
    * `estimated_history_tokens >= auto_compact_threshold` (JSON bytes of all messages / 4).
  </Step>
</Steps>

## What happens during compaction

When compaction triggers:

<Steps>
  <Step title="Emit CompactionStarted event">
    Emitted with input/estimated token counts and message count.
  </Step>

  <Step title="Send compaction prompt to LLM">
    The current conversation history plus a compaction prompt is sent to the LLM with no tools and `max_summary_tokens` as the response limit.
  </Step>

  <Step title="Handle result">
    **On failure**: a CompactionFailed event is emitted and the session is not mutated (safe failure).

    **On success**: `DefaultCompactor::rebuild_history` produces new messages:

    * System prompt is preserved verbatim (if present).
    * A summary message is injected as a User message with the prefix `[Context compacted]`.
    * The last `recent_turn_budget` complete turns are retained.
    * All other messages become `discarded`.
  </Step>

  <Step title="Index discarded memory">
    If semantic memory is enabled, discarded messages are indexed before the compacted history is committed.
  </Step>

  <Step title="Replace session messages">
    The session messages are replaced with the rebuilt history after memory indexing accepts the discarded content.
  </Step>

  <Step title="Record usage and emit completion event">
    Compaction usage is recorded against the session and budget. A **CompactionCompleted** event is emitted with summary token count and before/after message counts.
  </Step>
</Steps>

<Accordion title="The compaction prompt">
  The compactor sends this prompt to the LLM:

  > You are performing a CONTEXT COMPACTION. Your job is to create a handoff summary so work can continue seamlessly.
  >
  > Include:
  >
  > * Current progress and key decisions made
  > * Important context, constraints, or user preferences discovered
  > * What remains to be done (clear next steps)
  > * Any critical data, file paths, examples, or references needed to continue
  > * Tool call patterns that worked or failed
  >
  > Be concise and structured. Prioritize information the next context needs to act, not narrate.
</Accordion>

## Memory indexing after compaction

When both a `Compactor` and a `MemoryStore` are wired into the agent, discarded messages are indexed into semantic memory before compacted history is committed. If the memory store rejects indexing, Meerkat preserves the original history, emits a `CompactionFailed` event, and skips that compaction attempt instead of dropping the only authoritative copy of the discarded text.

For each discarded message:

* The message's indexable text content is extracted via `message.as_indexable_text()`.
* If non-empty, it is indexed with `MemoryMetadata` containing the session ID, the typed source handle (the offset range of the source message(s)), and a timestamp.

This means previously discarded conversation content becomes searchable via the `memory_search` tool.

## The `memory_search` tool

When memory is enabled, the agent gains a `memory_search` tool.

### Tool definition

| Property    | Value                                                                                                                                                                                                                                                 |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Name        | `memory_search`                                                                                                                                                                                                                                       |
| Description | Search semantic memory for past conversation content. Memory contains text from earlier conversation turns that were compacted away to save context space. Use this to recall information from earlier in the conversation or from previous sessions. |

### Parameters

<ParamField path="query" type="string" required>
  Natural language search query describing what you want to recall.
</ParamField>

<ParamField path="limit" type="integer" default="5">
  Maximum number of results to return. Capped at 20.
</ParamField>

### Response format

Returns a JSON array of result objects:

```json theme={null}
[
  {
    "content": "The text content of the memory entry",
    "score": 0.85,
    "source_range": { "start": 3, "end": 4 }
  }
]
```

<ResponseField name="content" type="string">
  The text content of the memory entry.
</ResponseField>

<ResponseField name="score" type="number">
  Similarity score from 0.0 (no match) to 1.0 (exact match). Typical useful matches are above 0.7.
</ResponseField>

<ResponseField name="source_range" type="object">
  The half-open `[start, end)` offset range of the source message(s) the entry
  was indexed from. Memory is scoped to a single session; results do not carry a
  `session_id`, and there is no cross-session recall.
</ResponseField>

## Memory store implementations

<Tabs>
  <Tab title="HnswMemoryStore (production)">
    Uses:

    * **hnsw\_rs** (v0.3) for approximate nearest-neighbor search with cosine distance.
    * **SQLite** for persistent metadata and text storage.

    Storage layout: `{store_path}/memory/memory.sqlite3`

    Key characteristics:

    * **Embedding**: Bag-of-words TF with hash-based dimensionality reduction (4096-dimensional vectors, L2-normalized). Each word is hashed to a bucket and its presence increments that dimension.
    * **Persistence**: Data survives process restart. On `open()`, all existing entries are re-indexed into per-session HNSW graphs from SQLite.
    * **Scoping**: One HNSW index per session owner; indexes are sized from the surviving entry count for that scope.
    * **Score conversion**: HNSW cosine distance (0 = identical, 2 = opposite) is converted to a 0..1 similarity score: `score = 1.0 - (distance / 2.0)`.
    * **Thread safety**: Insertions are serialized via a `Mutex` to couple point ID allocation with successful writes. The scoped-index map sits behind a `RwLock` for concurrent searches.
    * **Parameters** (`HnswParams` defaults): `max_nb_connection = 16`, `max_layer = 16`, `ef_construction = 200`, `ef_search = 200`.
  </Tab>

  <Tab title="SimpleMemoryStore (test-only)">
    In-memory substring matching -- not suitable for production.

    * Stores entries in a `Vec<MemoryEntry>` behind a `RwLock`.
    * Search: lowercases both query and content, counts matching words, scores as `matching_words / total_query_words`.
    * No persistence -- data is lost when the process exits.
  </Tab>
</Tabs>

## How memory gets wired

When the `memory-store-session` feature is compiled in and memory is enabled:

1. An `HnswMemoryStore` is opened at `{store_path}/memory/`.
2. The `memory_search` tool is added to the agent's tool set.
3. A `DefaultCompactor` is attached only if `session-compaction` is also enabled.
4. A built-in `memory-retrieval` skill is injected into the system prompt, teaching the agent how to use memory search.

## Examples

<Tabs>
  <Tab title="CLI">
    ```bash theme={null}
    # Build with memory + compaction features
    cargo build -p rkat --features session-compaction,memory-store

    # Enable the richer tool preset so memory_search is available
    rkat run --tools full "Analyze this large codebase..."
    ```
  </Tab>

  <Tab title="SDK">
    ```rust theme={null}
    use meerkat::{AgentFactory, AgentBuildConfig};
    use meerkat_store;

    let realm = meerkat_store::realm_paths("team-alpha");
    let factory = AgentFactory::new(realm.root.clone())
        .runtime_root(realm.root)
        .project_root(".")
        .builtins(true)
        .memory(true);

    let mut build_config = AgentBuildConfig::new("claude-sonnet-4-6");
    // Memory + compaction are wired automatically when:
    //   1. The factory has memory(true)
    //   2. The binary is compiled with session-compaction + memory-store-session features

    let agent = factory.build_agent(build_config, &config).await?;
    ```
  </Tab>
</Tabs>

### Custom CompactionConfig

```rust theme={null}
use meerkat_core::CompactionConfig;
use meerkat_session::DefaultCompactor;

let compactor = DefaultCompactor::new(CompactionConfig {
    auto_compact_threshold: 50_000,   // Compact earlier
    recent_turn_budget: 6,            // Keep more recent turns
    max_summary_tokens: 8192,         // Allow longer summaries
    min_turns_between_compactions: 5, // More spacing between compactions
});
```

## See also

* [Configuration: memory and compaction](/concepts/configuration) - config file settings
* [Architecture](/reference/architecture) - how compaction fits into the agent loop