> ## Documentation Index
> Fetch the complete documentation index at: https://docs.rkat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Memory & compaction

> Semantic memory indexing, context compaction, and budget controls.

For the full guide, see [Memory](/guides/memory).

<Note>
  The runtime toggles shown below still depend on a memory-enabled build. If the binary was compiled without the relevant memory features, `enable_memory` and `--tools full` do not magically add semantic memory on their own.
</Note>

## Enable semantic memory

When memory is enabled, discarded conversation turns are indexed into a semantic store. The agent gains a `memory_search` tool to recall past context on demand.

<Tabs>
  <Tab title="CLI">
    ```bash theme={null}
    rkat run "Analyze the full codebase" \
      --tools full
    ```
  </Tab>

  <Tab title="JSON-RPC">
    ```json theme={null}
    {
      "jsonrpc": "2.0", "id": 1,
      "method": "session/create",
      "params": {
        "prompt": "Analyze the full codebase",
        "enable_memory": true,
        "enable_builtins": true
      }
    }
    ```
  </Tab>

  <Tab title="REST">
    ```bash theme={null}
    curl -X POST http://127.0.0.1:8080/sessions \
      -H "Content-Type: application/json" \
      -d '{
        "prompt": "Analyze the full codebase",
        "enable_memory": true,
        "enable_builtins": true
      }'
    ```
  </Tab>

  <Tab title="MCP">
    ```json theme={null}
    {
      "name": "meerkat_run",
      "arguments": {
        "prompt": "Analyze the full codebase",
        "enable_builtins": true,
        "enable_memory": true
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    result = await client.create_session(
        "Analyze the full codebase",
        enable_memory=True,
        enable_builtins=True,
    )
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript theme={null}
    const result = await client.createSession("Analyze the full codebase", {
      enableMemory: true,
      enableBuiltins: true,
    });
    ```
  </Tab>

  <Tab title="Rust">
    ```rust theme={null}
    let factory = AgentFactory::new(store_path)
        .builtins(true)
        .memory(true);

    let agent = factory
        .build_agent(AgentBuildConfig::new("claude-sonnet-4-6"), &config)
        .await?;
    ```
  </Tab>
</Tabs>

## Memory search

When memory is enabled, the agent automatically gains the `memory_search` tool. It calls this tool when it needs to recall information from earlier, compacted-away turns.

The tool call appears in the event stream like any other tool invocation:

```json theme={null}
{
  "type": "tool_call_requested",
  "name": "memory_search",
  "args": {
    "query": "database migration steps from earlier",
    "limit": 5
  }
}
```

Results are returned as a JSON array with similarity scores and the typed
source handle (the half-open offset range of the source message(s) the entry
was indexed from). Memory is scoped to a single session; results do not carry
a `session_id`, and there is no cross-session recall:

```json theme={null}
[
  {
    "content": "We agreed to use sequential migrations with...",
    "score": 0.87,
    "source_range": { "start": 3, "end": 4 }
  }
]
```

<Note>
  Scores range from 0.0 (no match) to 1.0 (exact match). Useful results are typically above 0.7.
</Note>

## Compaction

Context compaction triggers automatically when the conversation history exceeds a token threshold. The compactor summarizes older turns, keeps recent ones, and indexes discarded messages into memory.

The compaction cycle emits events into the session stream:

```json theme={null}
// CompactionStarted -- context has grown past threshold
{
  "type": "compaction_started",
  "input_tokens": 105000,
  "estimated_history_tokens": 112000,
  "message_count": 48
}

// CompactionCompleted -- history rebuilt with summary
{
  "type": "compaction_completed",
  "summary_tokens": 2048,
  "messages_before": 48,
  "messages_after": 12
}

// CompactionFailed -- session is not mutated on failure
{
  "type": "compaction_failed",
  "reason": { "kind": "empty_summary" }
}
```

Custom thresholds via the Rust SDK:

<Tabs>
  <Tab title="Rust">
    ```rust theme={null}
    let compactor = DefaultCompactor::new(CompactionConfig {
        auto_compact_threshold: 50_000,
        recent_turn_budget: 6,
        max_summary_tokens: 8192,
        min_turns_between_compactions: 5,
    });
    ```
  </Tab>
</Tabs>

## Budget limits

Cap resource usage per session with token, time, and tool-call limits. When a budget is exhausted, the agent loop terminates gracefully.

<Tabs>
  <Tab title="CLI">
    ```bash theme={null}
    rkat run "Research this topic" \
      --max-tokens 10000 \
      --max-duration 30s \
      --max-tool-calls 5
    ```
  </Tab>

  <Tab title="JSON-RPC">
    ```json theme={null}
    {
      "jsonrpc": "2.0", "id": 1,
      "method": "session/create",
      "params": {
        "prompt": "Research this topic",
        "budget_limits": {
          "max_tokens": 10000,
          "max_tool_calls": 5
        }
      }
    }
    ```
  </Tab>

  <Tab title="REST">
    ```bash theme={null}
    curl -X POST http://127.0.0.1:8080/sessions \
      -H "Content-Type: application/json" \
      -d '{
        "prompt": "Research this topic",
        "budget_limits": {
          "max_tokens": 10000,
          "max_tool_calls": 5
        }
      }'
    ```
  </Tab>

  <Tab title="MCP">
    ```json theme={null}
    {
      "name": "meerkat_run",
      "arguments": {
        "prompt": "Research this topic",
        "enable_builtins": true,
        "budget_limits": {
          "max_tokens": 10000,
          "max_tool_calls": 5
        }
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    result = await client.create_session(
        "Research this topic",
        budget_limits={
            "max_tokens": 10000,
            "max_tool_calls": 5,
        },
    )
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript theme={null}
    const result = await client.createSession("Research this topic", {
      budgetLimits: {
        max_tokens: 10000,
        max_tool_calls: 5,
      },
    });
    ```
  </Tab>

  <Tab title="Rust">
    ```rust theme={null}
    let build_opts = SessionBuildOptions {
        budget_limits: Some(BudgetLimits {
            max_tokens: Some(10_000),
            max_duration: Some(Duration::from_secs(30)),
            max_tool_calls: Some(5),
        }),
        ..Default::default()
    };
    ```
  </Tab>
</Tabs>

## Budget events

When consumption nears a limit, a warning event is emitted before the budget is fully exhausted:

```json theme={null}
{
  "type": "budget_warning",
  "budget_type": "tokens",
  "used": 8500,
  "limit": 10000,
  "percent": 85.0
}
```

## Retry policy

Transient LLM errors (rate limits, network timeouts) trigger automatic retries with exponential backoff. Each retry attempt emits an event carrying the typed failure and the scheduled retry plan:

```json theme={null}
{
  "type": "retrying",
  "retry": {
    "failure": {
      "provider": "anthropic",
      "kind": "rate_limited",
      "retry_after_ms": 4000,
      "message": "rate limit exceeded"
    },
    "plan": {
      "attempt": 2,
      "max_retries": 5,
      "computed_delay_ms": 2000,
      "selected_delay_ms": 4000,
      "rate_limit_floor_applied": true,
      "budget_capped": false
    }
  }
}
```

<Note>
  Retries consume time budget but not token budget. If the time budget expires during a backoff wait, the agent terminates with a budget-exhausted error.
</Note>

## Next step

* [Examples: Structured output](/examples/structured-output)
* [Examples: Comms](/examples/comms)