Skip to main content
For the full guide, see Memory.

Enable semantic memory

When memory is enabled, discarded conversation turns are indexed into a semantic store. The agent gains a memory_search tool to recall past context on demand.
rkat run "Analyze the full codebase" \
  --enable-memory \
  --enable-builtins
When memory is enabled, the agent automatically gains the memory_search tool. It calls this tool when it needs to recall information from earlier, compacted-away turns. The tool call appears in the event stream like any other tool invocation:
{
  "type": "tool_call_requested",
  "name": "memory_search",
  "args": {
    "query": "database migration steps from earlier",
    "limit": 5
  }
}
Results are returned as a JSON array with similarity scores:
[
  {
    "content": "We agreed to use sequential migrations with...",
    "score": 0.87,
    "session_id": "019467d9-7e3a-7000-8000-000000000000",
    "turn": 3
  }
]
Scores range from 0.0 (no match) to 1.0 (exact match). Useful results are typically above 0.7.

Compaction

Context compaction triggers automatically when the conversation history exceeds a token threshold. The compactor summarizes older turns, keeps recent ones, and indexes discarded messages into memory. The compaction cycle emits events into the session stream:
// CompactionStarted -- context has grown past threshold
{
  "type": "compaction_started",
  "input_tokens": 105000,
  "estimated_history_tokens": 112000,
  "message_count": 48
}

// CompactionCompleted -- history rebuilt with summary
{
  "type": "compaction_completed",
  "summary_tokens": 2048,
  "messages_before": 48,
  "messages_after": 12
}

// CompactionFailed -- session is not mutated on failure
{
  "type": "compaction_failed",
  "error": "LLM returned empty summary"
}
Custom thresholds via the Rust SDK:
let compactor = DefaultCompactor::new(CompactionConfig {
    auto_compact_threshold: 50_000,
    recent_turn_budget: 6,
    max_summary_tokens: 8192,
    min_turns_between_compactions: 5,
});

Budget limits

Cap resource usage per session with token, time, and tool-call limits. When a budget is exhausted, the agent loop terminates gracefully.
rkat run "Research this topic" \
  --max-total-tokens 10000 \
  --max-duration 30s \
  --max-tool-calls 5

Budget events

When consumption nears a limit, a warning event is emitted before the budget is fully exhausted:
{
  "type": "budget_warning",
  "budget_type": "max_tokens",
  "used": 8500,
  "limit": 10000,
  "percent": 85.0
}

Retry policy

Transient LLM errors (rate limits, network timeouts) trigger automatic retries with exponential backoff. Each retry attempt emits an event:
{
  "type": "retrying",
  "attempt": 2,
  "max_attempts": 5,
  "error": "rate_limit_exceeded",
  "delay_ms": 4000
}
Retries consume time budget but not token budget. If the time budget expires during a backoff wait, the agent terminates with a budget-exhausted error.