For the full guide, see Memory.
Enable semantic memory
When memory is enabled, discarded conversation turns are indexed into a semantic store. The agent gains a memory_search tool to recall past context on demand.
CLI
JSON-RPC
REST
MCP
Python
TypeScript
Rust
rkat run "Analyze the full codebase" \
--enable-memory \
--enable-builtins
{
"jsonrpc": "2.0", "id": 1,
"method": "session/create",
"params": {
"prompt": "Analyze the full codebase",
"enable_memory": true,
"enable_builtins": true
}
}
curl -X POST http://127.0.0.1:8080/sessions \
-H "Content-Type: application/json" \
-d '{
"prompt": "Analyze the full codebase",
"enable_memory": true,
"enable_builtins": true
}'
{
"name": "meerkat_run",
"arguments": {
"prompt": "Analyze the full codebase",
"enable_builtins": true,
"enable_memory": true
}
}
result = client.create_session(
"Analyze the full codebase",
enable_memory=True,
enable_builtins=True,
)
const result = await client.createSession({
prompt: "Analyze the full codebase",
enableMemory: true,
enableBuiltins: true,
});
let factory = AgentFactory::new(store_path)
.builtins(true)
.memory(true);
let agent = factory
.build_agent(AgentBuildConfig::new("claude-sonnet-4-5"), &config)
.await?;
Memory search
When memory is enabled, the agent automatically gains the memory_search tool. It calls this tool when it needs to recall information from earlier, compacted-away turns.
The tool call appears in the event stream like any other tool invocation:
{
"type": "tool_call_requested",
"name": "memory_search",
"args": {
"query": "database migration steps from earlier",
"limit": 5
}
}
Results are returned as a JSON array with similarity scores:
[
{
"content": "We agreed to use sequential migrations with...",
"score": 0.87,
"session_id": "019467d9-7e3a-7000-8000-000000000000",
"turn": 3
}
]
Scores range from 0.0 (no match) to 1.0 (exact match). Useful results are typically above 0.7.
Compaction
Context compaction triggers automatically when the conversation history exceeds a token threshold. The compactor summarizes older turns, keeps recent ones, and indexes discarded messages into memory.
The compaction cycle emits events into the session stream:
// CompactionStarted -- context has grown past threshold
{
"type": "compaction_started",
"input_tokens": 105000,
"estimated_history_tokens": 112000,
"message_count": 48
}
// CompactionCompleted -- history rebuilt with summary
{
"type": "compaction_completed",
"summary_tokens": 2048,
"messages_before": 48,
"messages_after": 12
}
// CompactionFailed -- session is not mutated on failure
{
"type": "compaction_failed",
"error": "LLM returned empty summary"
}
Custom thresholds via the Rust SDK:
let compactor = DefaultCompactor::new(CompactionConfig {
auto_compact_threshold: 50_000,
recent_turn_budget: 6,
max_summary_tokens: 8192,
min_turns_between_compactions: 5,
});
Budget limits
Cap resource usage per session with token, time, and tool-call limits. When a budget is exhausted, the agent loop terminates gracefully.
CLI
JSON-RPC
REST
MCP
Python
TypeScript
Rust
rkat run "Research this topic" \
--max-total-tokens 10000 \
--max-duration 30s \
--max-tool-calls 5
{
"jsonrpc": "2.0", "id": 1,
"method": "session/create",
"params": {
"prompt": "Research this topic",
"budget_limits": {
"max_total_tokens": 10000,
"max_duration_ms": 30000,
"max_tool_calls": 5
}
}
}
curl -X POST http://127.0.0.1:8080/sessions \
-H "Content-Type: application/json" \
-d '{
"prompt": "Research this topic",
"budget_limits": {
"max_total_tokens": 10000,
"max_duration_ms": 30000,
"max_tool_calls": 5
}
}'
{
"name": "meerkat_run",
"arguments": {
"prompt": "Research this topic",
"enable_builtins": true,
"budget_limits": {
"max_tokens": 10000,
"max_duration_secs": 30,
"max_tool_calls": 5
}
}
}
result = client.create_session(
"Research this topic",
budget_limits={
"max_total_tokens": 10000,
"max_duration_ms": 30000,
"max_tool_calls": 5,
},
)
const result = await client.createSession({
prompt: "Research this topic",
budgetLimits: {
maxTotalTokens: 10000,
maxDurationMs: 30000,
maxToolCalls: 5,
},
});
let build_opts = SessionBuildOptions {
budget_limits: Some(BudgetLimits {
max_tokens: Some(10_000),
max_duration: Some(Duration::from_secs(30)),
max_tool_calls: Some(5),
}),
..Default::default()
};
Budget events
When consumption nears a limit, a warning event is emitted before the budget is fully exhausted:
{
"type": "budget_warning",
"budget_type": "max_tokens",
"used": 8500,
"limit": 10000,
"percent": 85.0
}
Retry policy
Transient LLM errors (rate limits, network timeouts) trigger automatic retries with exponential backoff. Each retry attempt emits an event:
{
"type": "retrying",
"attempt": 2,
"max_attempts": 5,
"error": "rate_limit_exceeded",
"delay_ms": 4000
}
Retries consume time budget but not token budget. If the time budget expires during a backoff wait, the agent terminates with a budget-exhausted error.