Overview
This guide demonstrates the most common usage patterns for switchAILocal, from simple chat completions to multi-provider routing.
Simple Chat Completion
The most basic usage - send a message and get a response:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Auto-Routing (No Provider Prefix)
Let switchAILocal automatically select the best available provider:
from openai import OpenAI
client = OpenAI(
base_url = "http://localhost:18080/v1" ,
api_key = "sk-test-123" ,
)
# No prefix = auto-routing to any logged-in provider
completion = client.chat.completions.create(
model = "gemini-2.5-pro" , # switchAILocal picks: geminicli, gemini API, or switchAI
messages = [{ "role" : "user" , "content" : "What is the meaning of life?" }]
)
Auto-routing prioritizes:
CLI providers (if authenticated)
API providers (if keys configured)
Local providers (Ollama, LM Studio)
Explicit Provider Selection
Force routing to a specific provider using prefixes:
Gemini CLI
Ollama (Local)
switchAI Cloud
Claude CLI
completion = client.chat.completions.create(
model = "geminicli:gemini-2.5-pro" , # Force Gemini CLI
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
completion = client.chat.completions.create(
model = "ollama:llama3.2" , # Force Ollama local model
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
completion = client.chat.completions.create(
model = "switchai:switchai-fast" , # Force switchAI cloud
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
completion = client.chat.completions.create(
model = "claudecli:claude-sonnet-4" , # Force Claude CLI
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
List Available Models
Discover all models from all configured providers:
curl http://localhost:18080/v1/models \
-H "Authorization: Bearer sk-test-123"
Example Output:
geminicli:gemini-2.5-pro (google)
ollama:llama3.2 (ollama)
switchai:switchai-fast (traylinx)
switchai:switchai-reasoner (traylinx)
claudecli:claude-sonnet-4 (anthropic)
Multi-turn Conversations
Maintain conversation context across multiple turns:
from openai import OpenAI
client = OpenAI(
base_url = "http://localhost:18080/v1" ,
api_key = "sk-test-123" ,
)
messages = [
{ "role" : "system" , "content" : "You are a helpful coding assistant." },
{ "role" : "user" , "content" : "Write a Python function to calculate factorial" }
]
# First turn
response = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = messages
)
# Add assistant response to history
messages.append({
"role" : "assistant" ,
"content" : response.choices[ 0 ].message.content
})
# Second turn
messages.append({
"role" : "user" ,
"content" : "Now add error handling"
})
response = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = messages
)
print (response.choices[ 0 ].message.content)
Temperature Control
Adjust creativity and randomness:
# Low temperature (0.0-0.3) = Focused, deterministic
code_response = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [{ "role" : "user" , "content" : "Write a sorting algorithm" }],
temperature = 0.2 , # Precise, consistent code
)
# High temperature (0.7-1.0) = Creative, varied
story_response = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [{ "role" : "user" , "content" : "Write a short story" }],
temperature = 0.9 , # Creative, diverse outputs
)
Max Tokens Limit
Control response length:
completion = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [{ "role" : "user" , "content" : "Explain quantum computing" }],
max_tokens = 200 , # Limit to ~200 tokens (approx 150 words)
)
System Messages
Set the assistant’s behavior and personality:
completion = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [
{
"role" : "system" ,
"content" : "You are a senior Go developer. Always provide idiomatic Go code with error handling."
},
{
"role" : "user" ,
"content" : "Show me how to read a JSON file"
}
]
)
Error Handling
from openai import OpenAI, APIError, APIConnectionError
client = OpenAI(
base_url = "http://localhost:18080/v1" ,
api_key = "sk-test-123" ,
)
try :
completion = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (completion.choices[ 0 ].message.content)
except APIConnectionError as e:
print ( f "Connection error: { e } " )
except APIError as e:
print ( f "API error: { e.status_code } - { e.message } " )
Provider Prefix Reference
Prefix Provider Type Example geminicli:Google Gemini CLI CLI Tool geminicli:gemini-2.5-proclaudecli:Anthropic Claude CLI CLI Tool claudecli:claude-sonnet-4codex:OpenAI Codex CLI CLI Tool codex:gpt-4vibe:Mistral Vibe CLI CLI Tool vibe:mistral-largeollama:Ollama Local ollama:llama3.2lmstudio:LM Studio Local lmstudio:mistral-7bswitchai:Traylinx switchAI Cloud API switchai:switchai-fastgemini:Google AI Studio Cloud API gemini:gemini-2.5-proclaude:Anthropic API Cloud API claude:claude-3-5-sonnetopenai:OpenAI API Cloud API openai:gpt-4
No Prefix = Auto-routing - switchAILocal will intelligently select the best available provider.
Next Steps
Streaming Real-time streaming responses
Multi-Provider Advanced multi-provider patterns
Intelligent Routing Auto-routing with Cortex Router
Python SDK Complete Python SDK reference