> ## Documentation Index
> Fetch the complete documentation index at: https://docs.rkat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Memory and compaction

> Automatic context compaction and semantic memory for long-running conversations

Two systems handle conversations that exceed model context limits:

1. **Context compaction** -- automatically summarizes conversation history when the context window fills up, preserving recent turns and discarding older ones.
2. **Semantic memory** -- indexes discarded messages so the agent can retrieve past context on demand via the `memory_search` tool.

Together they allow agents to maintain coherent multi-turn sessions that exceed any single model's context limit.

## What this guide is for

Use this guide when you want to understand or configure:

* long-horizon conversation behavior
* semantic recall via `memory_search`
* compaction thresholds
* the relationship between compaction and semantic memory

## Feature flags

Compaction and semantic memory have separate build-time and runtime switches.

| Feature flag          | Cargo feature          | What it enables                                                                        |
| --------------------- | ---------------------- | -------------------------------------------------------------------------------------- |
| Compaction strategy   | `session-compaction`   | `DefaultCompactor` in `meerkat-session`                                                |
| Memory store backend  | `memory-store`         | CLI convenience feature enabling the memory store backend and session wiring           |
| Memory session wiring | `memory-store-session` | `HnswMemoryStore`, `memory_search`, and memory integration with the agent/session path |

<Note>
  The CLI's default features include `skills` but **not** `session-compaction` or memory features. You must enable them explicitly for source builds.
</Note>

```bash theme={null}
cargo build -p rkat --features session-compaction,memory-store
```

At runtime, `AgentFactory::memory(true)` enables the semantic memory path (`HnswMemoryStore` + `memory_search`). Compaction is controlled independently by whether the binary is built with `session-compaction` and whether a compactor is wired into the session path.

```rust theme={null}
let factory = AgentFactory::new(store_path)
    .builtins(true)
    .memory(true);  // enable semantic memory

// Per-request override:
let mut build_config = AgentBuildConfig::new("claude-sonnet-4-6");
build_config.override_memory = ToolCategoryOverride::Enable;
```

That means these cases are all valid:

* compaction enabled, semantic memory disabled
* semantic memory enabled, compaction unavailable
* both enabled together

## CompactionConfig

Controls when and how compaction runs.

| Field                           | Type    | Default   | Description                                                                                                                                                    |
| ------------------------------- | ------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `auto_compact_threshold`        | `u64`   | `100_000` | Compaction triggers when `last_input_tokens >= threshold` OR `estimated_history_tokens >= threshold`.                                                          |
| `recent_turn_budget`            | `usize` | `4`       | Number of recent complete turns to retain after compaction. A turn is a User message followed by all subsequent non-User messages until the next User message. |
| `max_summary_tokens`            | `u32`   | `4096`    | Maximum tokens the LLM may produce for the compaction summary.                                                                                                 |
| `min_turns_between_compactions` | `u32`   | `3`       | Minimum turns that must elapse between consecutive compactions (loop guard to prevent runaway compaction).                                                     |

## How compaction triggers

Compaction is checked at every turn boundary, just before the next LLM call.

The decision flow:

<Steps>
  <Step title="Skip turn 0">
    The first turn always skips compaction.
  </Step>

  <Step title="Loop guard check">
    If compaction occurred at turn N, no compaction until turn N + `min_turns_between_compactions`.
  </Step>

  <Step title="Dual threshold evaluation">
    Compaction triggers if EITHER:

    * `last_input_tokens >= auto_compact_threshold` (input tokens from the last LLM response), OR
    * `estimated_history_tokens >= auto_compact_threshold` (JSON bytes of all messages / 4).
  </Step>
</Steps>

## What happens during compaction

When compaction triggers:

<Steps>
  <Step title="Emit CompactionStarted event">
    Emitted with input/estimated token counts and message count.
  </Step>

  <Step title="Send compaction prompt to LLM">
    The current conversation history plus a compaction prompt is sent to the LLM with no tools and `max_summary_tokens` as the response limit.
  </Step>

  <Step title="Handle result">
    **On failure**: a CompactionFailed event is emitted and the session is not mutated (safe failure).

    **On success**: `DefaultCompactor::rebuild_history` produces new messages:

    * System prompt is preserved verbatim (if present).
    * A summary message is injected as a User message with the prefix `[Context compacted]`.
    * The last `recent_turn_budget` complete turns are retained.
    * All other messages become `discarded`.
  </Step>

  <Step title="Index discarded memory">
    If semantic memory is enabled, discarded messages are indexed before the compacted history is committed.
  </Step>

  <Step title="Replace session messages">
    The session messages are replaced with the rebuilt history after memory indexing accepts the discarded content.
  </Step>

  <Step title="Record usage and emit completion event">
    Compaction usage is recorded against the session and budget. A **CompactionCompleted** event is emitted with summary token count and before/after message counts.
  </Step>
</Steps>

<Accordion title="The compaction prompt">
  The compactor sends this prompt to the LLM:

  > You are performing a CONTEXT COMPACTION. Your job is to create a handoff summary so work can continue seamlessly.
  >
  > Include:
  >
  > * Current progress and key decisions made
  > * Important context, constraints, or user preferences discovered
  > * What remains to be done (clear next steps)
  > * Any critical data, file paths, examples, or references needed to continue
  > * Tool call patterns that worked or failed
  >
  > Be concise and structured. Prioritize information the next context needs to act, not narrate.
</Accordion>

## Memory indexing after compaction

When both a `Compactor` and a `MemoryStore` are wired into the agent, discarded messages are indexed into semantic memory before compacted history is committed. If the memory store rejects indexing, Meerkat preserves the original history, emits a `CompactionFailed` event, and skips that compaction attempt instead of dropping the only authoritative copy of the discarded text.

For each discarded message:

* The message's indexable text content is extracted via `message.as_indexable_text()`.
* If non-empty, it is indexed with `MemoryMetadata` containing the session ID, current turn number, and a timestamp.

This means previously discarded conversation content becomes searchable via the `memory_search` tool.

## The `memory_search` tool

When memory is enabled, the agent gains a `memory_search` tool.

### Tool definition

| Property    | Value                                                                                                                                                      |
| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Name        | `memory_search`                                                                                                                                            |
| Description | Search semantic memory for past conversation content. Memory contains text from earlier conversation turns that were compacted away to save context space. |

### Parameters

<ParamField path="query" type="string" required>
  Natural language search query describing what you want to recall.
</ParamField>

<ParamField path="limit" type="integer" default="5">
  Maximum number of results to return. Capped at 20.
</ParamField>

### Response format

Returns a JSON array of result objects:

```json theme={null}
[
  {
    "content": "The text content of the memory entry",
    "score": 0.85,
    "session_id": "019467d9-7e3a-7000-8000-000000000000",
    "turn": 3
  }
]
```

<ResponseField name="content" type="string">
  The text content of the memory entry.
</ResponseField>

<ResponseField name="score" type="number">
  Similarity score from 0.0 (no match) to 1.0 (exact match). Typical useful matches are above 0.7.
</ResponseField>

<ResponseField name="session_id" type="string">
  The session the memory originated from (enables cross-session recall).
</ResponseField>

<ResponseField name="turn" type="integer">
  The turn number within the session when the memory was indexed.
</ResponseField>

## Memory store implementations

<Tabs>
  <Tab title="HnswMemoryStore (production)">
    Uses:

    * **hnsw\_rs** (v0.3) for approximate nearest-neighbor search with cosine distance.
    * **SQLite** for persistent metadata and text storage.

    Storage layout: `.rkat/memory/memory.sqlite3`

    Key characteristics:

    * **Embedding**: Bag-of-words TF with hash-based dimensionality reduction (4096-dimensional vectors, L2-normalized). Each word is hashed to a bucket and its presence increments that dimension.
    * **Persistence**: Data survives process restart. On `open()`, all existing entries are re-indexed into the HNSW graph from SQLite.
    * **Score conversion**: HNSW cosine distance (0 = identical, 2 = opposite) is converted to a 0..1 similarity score: `score = 1.0 - (distance / 2.0)`.
    * **Thread safety**: Insertions are serialized via a `Mutex` to couple point ID allocation with successful writes. Searches use a `RwLock` for concurrent reads.
    * **Constants**: `MAX_NB_CONNECTION = 16`, `MAX_LAYER = 16`, `EF_CONSTRUCTION = 200`, `DEFAULT_MAX_ELEMENTS = 100_000`.
  </Tab>

  <Tab title="SimpleMemoryStore (test-only)">
    In-memory substring matching -- not suitable for production.

    * Stores entries in a `Vec<MemoryEntry>` behind a `RwLock`.
    * Search: lowercases both query and content, counts matching words, scores as `matching_words / total_query_words`.
    * No persistence -- data is lost when the process exits.
  </Tab>
</Tabs>

## How memory gets wired

When the `memory-store-session` feature is compiled in and memory is enabled:

1. An `HnswMemoryStore` is opened at `{store_path}/memory/`.
2. The `memory_search` tool is added to the agent's tool set.
3. A `DefaultCompactor` is attached only if `session-compaction` is also enabled.
4. A built-in `memory-retrieval` skill is injected into the system prompt, teaching the agent how to use memory search.

## Examples

<Tabs>
  <Tab title="CLI">
    ```bash theme={null}
    # Build with memory + compaction features
    cargo build -p rkat --features session-compaction,memory-store

    # Enable the richer tool preset so memory_search is available
    rkat run --tools full "Analyze this large codebase..."
    ```
  </Tab>

  <Tab title="SDK">
    ```rust theme={null}
    use meerkat::{AgentFactory, AgentBuildConfig};
    use meerkat_store;

    let realm = meerkat_store::realm_paths("team-alpha");
    let factory = AgentFactory::new(realm.root.clone())
        .runtime_root(realm.root)
        .project_root(".")
        .builtins(true)
        .memory(true);

    let mut build_config = AgentBuildConfig::new("claude-sonnet-4-6");
    // Memory + compaction are wired automatically when:
    //   1. The factory has memory(true)
    //   2. The binary is compiled with session-compaction + memory-store-session features

    let agent = factory.build_agent(build_config, &config).await?;
    ```
  </Tab>
</Tabs>

### Custom CompactionConfig

```rust theme={null}
use meerkat_core::CompactionConfig;
use meerkat_session::DefaultCompactor;

let compactor = DefaultCompactor::new(CompactionConfig {
    auto_compact_threshold: 50_000,   // Compact earlier
    recent_turn_budget: 6,            // Keep more recent turns
    max_summary_tokens: 8192,         // Allow longer summaries
    min_turns_between_compactions: 5, // More spacing between compactions
});
```

## See also

* [Configuration: memory and compaction](/concepts/configuration) - config file settings
* [Architecture](/reference/architecture) - how compaction fits into the agent loop
