Documentation Index Fetch the complete documentation index at: https://docs.rkat.ai/llms.txt
Use this file to discover all available pages before exploring further.
Self-hosted models are configured as first-class Meerkat model IDs. Once an
alias such as gemma-4-31b is registered, users can pass it anywhere they would
pass a hosted model:
rkat run -m gemma-4-31b "Summarize this repository"
rkat run --resume -m gemma-4-e4b "Keep going, but use the faster local model"
rkat models
rkat doctor
This guide covers the general self-hosting contract and the Gemma 4 worked
example. There is no separate Gemma page: Gemma 4 is one family under the same
self-hosted model system as Ollama, LM Studio, vLLM, or another private
OpenAI-compatible endpoint.
Model
Every self-hosted setup has two pieces:
Config section Owns self_hosted.servers.<server_id>Transport, base URL, API style, and credentials for one serving endpoint self_hosted.models.<alias>The Meerkat-facing model ID, display metadata, capabilities, and upstream remote_model
The alias is the name users type. The remote_model is the model name returned
or expected by the serving stack.
Prefer bearer_token_env over bearer_token so secrets stay out of checked-in
configuration.
Server Shape
For Ollama, LM Studio, vLLM, and most private gateways, use the
OpenAI-compatible transport:
[ self_hosted . servers . local ]
transport = "openai_compatible"
base_url = "http://127.0.0.1:11434"
api_style = "chat_completions"
bearer_token_env = "LOCAL_LLM_TOKEN"
api_style = "chat_completions" is the conservative default for Gemma 4 and
other self-hosted tool-calling models. Use another API style only after the
specific serving stack has been validated with Meerkat tools, structured output,
and multimodal input.
Alias Shape
[ self_hosted . models . gemma-4-31b ]
server = "local"
remote_model = "gemma4:31b"
display_name = "Gemma 4 31B"
family = "gemma-4"
tier = "supported"
context_window = 256000
max_output_tokens = 8192
vision = true
image_tool_results = true
inline_video = false
supports_temperature = true
supports_thinking = true
supports_reasoning = true
call_timeout_secs = 600
supports_thinking and supports_reasoning describe the behavior Meerkat
should expose through the configured transport. Gemma 4 is reasoning-capable,
but normalized reasoning controls and trace streaming still vary by serving
stack, so validate the behavior you plan to rely on.
Meerkat’s current self-hosted path does not expose self-hosted Gemma audio or a
self-hosted realtime transport. Treat these aliases as text, image-input, and
tool-capable models unless a dedicated self-hosted realtime path is documented.
Gemma 4 Aliases
Recommended aliases:
Alias Good default use gemma-4-e2bLowest-footprint local experiments gemma-4-e4bFast local iteration with more headroom gemma-4-26b-a4bStronger quality on a serious local or remote GPU setup gemma-4-31bBest quality of the four, usually best on a dedicated server
Ollama
Use Ollama when the model runs on the same machine as Meerkat and you want the
lightest local setup.
Serve the model
ollama pull gemma4:31b
ollama list
Register Ollama
[ self_hosted . servers . ollama ]
transport = "openai_compatible"
base_url = "http://127.0.0.1:11434"
api_style = "chat_completions"
Add aliases
[ self_hosted . models . gemma-4-e2b ]
server = "ollama"
remote_model = "gemma4:e2b"
display_name = "Gemma 4 E2B"
family = "gemma-4"
tier = "supported"
context_window = 128000
max_output_tokens = 8192
vision = true
image_tool_results = true
inline_video = false
supports_temperature = true
supports_thinking = true
supports_reasoning = true
[ self_hosted . models . gemma-4-31b ]
server = "ollama"
remote_model = "gemma4:31b"
display_name = "Gemma 4 31B"
family = "gemma-4"
tier = "supported"
context_window = 256000
max_output_tokens = 8192
vision = true
image_tool_results = true
inline_video = false
supports_temperature = true
supports_thinking = true
supports_reasoning = true
call_timeout_secs = 600
LM Studio
Use LM Studio when you want a desktop-managed OpenAI-compatible server.
Start the local server
Load the Gemma 4 model in LM Studio, then start the local server.
Register LM Studio
[ self_hosted . servers . lmstudio ]
transport = "openai_compatible"
base_url = "http://127.0.0.1:1234"
api_style = "chat_completions"
Alias the served model
Use the model name LM Studio exposes in its /v1/models output. [ self_hosted . models . gemma-4-e4b ]
server = "lmstudio"
remote_model = "google/gemma-4-e4b"
display_name = "Gemma 4 E4B"
family = "gemma-4"
tier = "supported"
context_window = 128000
max_output_tokens = 8192
vision = true
image_tool_results = true
inline_video = false
supports_temperature = true
supports_thinking = true
supports_reasoning = true
vLLM
Use vLLM when you want a private server with more deployment control.
Launch vLLM
Start vLLM with the Gemma 4 model you want to expose.
Register the server
[ self_hosted . servers . vllm ]
transport = "openai_compatible"
base_url = "http://my-gpu-box:8000"
api_style = "chat_completions"
bearer_token_env = "VLLM_API_TOKEN"
Add remote aliases
Point aliases at the exact model names your vLLM endpoint exposes. [ self_hosted . models . gemma-4-26b-a4b ]
server = "vllm"
remote_model = "google/gemma-4-26b-a4b"
display_name = "Gemma 4 26B A4B"
family = "gemma-4"
tier = "supported"
context_window = 256000
max_output_tokens = 8192
vision = true
image_tool_results = true
inline_video = false
supports_temperature = true
supports_thinking = true
supports_reasoning = true
call_timeout_secs = 600
[ self_hosted . models . gemma-4-31b ]
server = "vllm"
remote_model = "google/gemma-4-31b"
display_name = "Gemma 4 31B"
family = "gemma-4"
tier = "supported"
context_window = 256000
max_output_tokens = 8192
vision = true
image_tool_results = true
inline_video = false
supports_temperature = true
supports_thinking = true
supports_reasoning = true
call_timeout_secs = 600
Validation
Run these after adding or editing self-hosted model config:
rkat models
rkat doctor
rkat run -m gemma-4-31b "Say hello in one sentence."
The expected result:
rkat models shows a self_hosted provider group and the aliases you added.
rkat doctor reports the server as reachable.
rkat run -m ... works without an explicit --provider.
The alias points at the exact upstream remote_model exposed by the server.
See Also
Providers Hosted and self-hosted provider model.
CLI configuration Config file locations and model settings.
CLI commands Commands for running, diagnosing, and inspecting models.