Memory & compaction

For the full guide, see Memory.

The runtime toggles shown below still depend on a memory-enabled build. If the binary was compiled without the relevant memory features, enable_memory and --tools full do not magically add semantic memory on their own.

Enable semantic memory

When memory is enabled, discarded conversation turns are indexed into a semantic store. The agent gains a memory_search tool to recall past context on demand.

rkat run "Analyze the full codebase" \
  --tools full

{
  "jsonrpc": "2.0", "id": 1,
  "method": "session/create",
  "params": {
    "prompt": "Analyze the full codebase",
    "enable_memory": true,
    "enable_builtins": true
  }
}

curl -X POST http://127.0.0.1:8080/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Analyze the full codebase",
    "enable_memory": true,
    "enable_builtins": true
  }'

{
  "name": "meerkat_run",
  "arguments": {
    "prompt": "Analyze the full codebase",
    "enable_builtins": true,
    "enable_memory": true
  }
}

result = await client.create_session(
    "Analyze the full codebase",
    enable_memory=True,
    enable_builtins=True,
)

const result = await client.createSession("Analyze the full codebase", {
  enableMemory: true,
  enableBuiltins: true,
});

let factory = AgentFactory::new(store_path)
    .builtins(true)
    .memory(true);

let agent = factory
    .build_agent(AgentBuildConfig::new("claude-sonnet-4-6"), &config)
    .await?;

Memory search

When memory is enabled, the agent automatically gains the memory_search tool. It calls this tool when it needs to recall information from earlier, compacted-away turns. The tool call appears in the event stream like any other tool invocation:

{
  "type": "tool_call_requested",
  "name": "memory_search",
  "args": {
    "query": "database migration steps from earlier",
    "limit": 5
  }
}

Results are returned as a JSON array with similarity scores and the typed source handle (the half-open offset range of the source message(s) the entry was indexed from). Memory is scoped to a single session; results do not carry a session_id, and there is no cross-session recall:

[
  {
    "content": "We agreed to use sequential migrations with...",
    "score": 0.87,
    "source_range": { "start": 3, "end": 4 }
  }
]

Scores range from 0.0 (no match) to 1.0 (exact match). Useful results are typically above 0.7.

Compaction

Context compaction triggers automatically when the conversation history exceeds a token threshold. The compactor summarizes older turns, keeps recent ones, and indexes discarded messages into memory. The compaction cycle emits events into the session stream:

// CompactionStarted -- context has grown past threshold
{
  "type": "compaction_started",
  "input_tokens": 105000,
  "estimated_history_tokens": 112000,
  "message_count": 48
}

// CompactionCompleted -- history rebuilt with summary
{
  "type": "compaction_completed",
  "summary_tokens": 2048,
  "messages_before": 48,
  "messages_after": 12
}

// CompactionFailed -- session is not mutated on failure
{
  "type": "compaction_failed",
  "reason": { "kind": "empty_summary" }
}

Custom thresholds via the Rust SDK:

Rust

let compactor = DefaultCompactor::new(CompactionConfig {
    auto_compact_threshold: 50_000,
    recent_turn_budget: 6,
    max_summary_tokens: 8192,
    min_turns_between_compactions: 5,
});

Budget limits

Cap resource usage per session with token, time, and tool-call limits. When a budget is exhausted, the agent loop terminates gracefully.

rkat run "Research this topic" \
  --max-tokens 10000 \
  --max-duration 30s \
  --max-tool-calls 5

{
  "jsonrpc": "2.0", "id": 1,
  "method": "session/create",
  "params": {
    "prompt": "Research this topic",
    "budget_limits": {
      "max_tokens": 10000,
      "max_tool_calls": 5
    }
  }
}

curl -X POST http://127.0.0.1:8080/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Research this topic",
    "budget_limits": {
      "max_tokens": 10000,
      "max_tool_calls": 5
    }
  }'

{
  "name": "meerkat_run",
  "arguments": {
    "prompt": "Research this topic",
    "enable_builtins": true,
    "budget_limits": {
      "max_tokens": 10000,
      "max_tool_calls": 5
    }
  }
}

result = await client.create_session(
    "Research this topic",
    budget_limits={
        "max_tokens": 10000,
        "max_tool_calls": 5,
    },
)

const result = await client.createSession("Research this topic", {
  budgetLimits: {
    max_tokens: 10000,
    max_tool_calls: 5,
  },
});

let build_opts = SessionBuildOptions {
    budget_limits: Some(BudgetLimits {
        max_tokens: Some(10_000),
        max_duration: Some(Duration::from_secs(30)),
        max_tool_calls: Some(5),
    }),
    ..Default::default()
};

Budget events

When consumption nears a limit, a warning event is emitted before the budget is fully exhausted:

{
  "type": "budget_warning",
  "budget_type": "tokens",
  "used": 8500,
  "limit": 10000,
  "percent": 85.0
}

Retry policy

Transient LLM errors (rate limits, network timeouts) trigger automatic retries with exponential backoff. Each retry attempt emits an event carrying the typed failure and the scheduled retry plan:

{
  "type": "retrying",
  "retry": {
    "failure": {
      "provider": "anthropic",
      "kind": "rate_limited",
      "retry_after_ms": 4000,
      "message": "rate limit exceeded"
    },
    "plan": {
      "attempt": 2,
      "max_retries": 5,
      "computed_delay_ms": 2000,
      "selected_delay_ms": 4000,
      "rate_limit_floor_applied": true,
      "budget_capped": false
    }
  }
}

Retries consume time budget but not token budget. If the time budget expires during a backoff wait, the agent terminates with a budget-exhausted error.

​Enable semantic memory

​Memory search

​Compaction

​Budget limits

​Budget events

​Retry policy

​Next step

Enable semantic memory

Memory search

Compaction

Budget limits

Budget events

Retry policy

Next step