Documentation Index Fetch the complete documentation index at: https://ail.traylinx.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Auto-routing enables switchAILocal to automatically select the best available provider for your request. Omit the provider prefix from your model name to activate intelligent routing.
How It Works
Basic Auto-Routing
Simply use the model name without a provider prefix:
from openai import OpenAI
client = OpenAI(
base_url = "http://localhost:18080/v1" ,
api_key = "sk-test-123"
)
# Auto-routing: no provider prefix
response = client.chat.completions.create(
model = "gemini-2.5-pro" , # Not "geminicli:gemini-2.5-pro"
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Routing Algorithm
switchAILocal evaluates providers in this order:
Provider Availability : Check if provider supports the model
Provider Health : Skip unhealthy or quota-exceeded providers
Priority Order : Follow configured priority preferences
Cost Optimization : Prefer CLI and local providers (free)
Success Rate : Favor providers with better historical performance
Fallback : Try alternative providers if primary fails
Routing Priority
Default Priority
By default, switchAILocal prioritizes in this order:
CLI Providers (geminicli:, claudecli:, etc.) - Uses your paid subscriptions
Local Providers (ollama:, lmstudio:) - Free and private
switchAI (switchai:) - Unified gateway with auto-selection
API Providers (gemini:, claude:, etc.) - Direct API access
Custom Priority
Override the default priority in config.yaml:
routing :
priority :
- ollama # Try local models first
- geminicli # Then CLI providers
- switchai # Then switchAI
- gemini # Finally APIs
Intelligent Features
Health-Based Routing
switchAILocal monitors provider health and automatically routes away from failing providers:
heartbeat :
enabled : true
interval : 60 # Check every 60 seconds
providers :
- geminicli
- ollama
- switchai
Unhealthy providers are automatically skipped during routing.
Quota-Aware Routing
When a provider exceeds quota, switchAILocal automatically fails over:
# First request succeeds with geminicli
response1 = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
# Used: geminicli
# If geminicli hits quota, automatically switches
response2 = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
# Used: gemini (API fallback)
Success Rate Optimization
With Memory system enabled, switchAILocal learns which providers perform best:
memory :
enabled : true
provider_selection :
enabled : true
min_samples : 10 # Learn after 10 requests
Providers with higher success rates are preferred in future requests.
Model Mapping
Automatically map unavailable models to alternatives:
ampcode :
model_mappings :
- from : "gpt-5"
to : "gemini-2.5-pro"
regex : false
- from : "claude-opus-4"
to : "claude-sonnet-4"
regex : false
Requests for unavailable models are automatically redirected.
Configuration
Enable Auto-Routing
routing :
auto_routing : true # Default: true
fallback_enabled : true # Try alternatives on failure
max_retries : 3 # Retry attempts per provider
Provider Weights
Assign weights to providers for load distribution:
routing :
weights :
geminicli : 0.5 # 50% of requests
ollama : 0.3 # 30% of requests
switchai : 0.2 # 20% of requests
Exclude Providers
Exclude specific providers from auto-routing:
routing :
exclude :
- expensive-provider
- slow-provider
Examples
Cost-Optimized Routing
routing :
priority :
- ollama # Free local models
- geminicli # Free CLI (with subscription)
- switchai # Paid unified gateway
# Automatically uses cheapest available provider
response = client.chat.completions.create(
model = "llama3.2" , # Available in Ollama
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
# Used: ollama (free)
routing :
priority :
- switchai # Fast cloud API
- gemini # Fast Google API
- geminicli # Slower CLI
- ollama # Depends on hardware
Privacy-Optimized Routing
routing :
priority :
- ollama # Fully local
- geminicli # Local CLI execution
exclude :
- switchai # Cloud service
- gemini # Cloud service
- claude # Cloud service
Hybrid Strategy
Combine local and cloud for best of both:
def route_by_task ( task_type ):
if task_type == "simple" :
# Use local for simple tasks
return "llama3.2" # Routed to Ollama
elif task_type == "complex" :
# Use cloud for complex tasks
return "gemini-2.5-pro" # Routed to best Gemini provider
elif task_type == "coding" :
# Use CLI for coding (supports attachments)
return "geminicli:gemini-2.5-pro"
response = client.chat.completions.create(
model = route_by_task( "complex" ),
messages = [{ "role" : "user" , "content" : "Explain quantum computing" }]
)
Routing Transparency
Check which provider was used via response headers:
from openai import OpenAI
import httpx
client = OpenAI(
base_url = "http://localhost:18080/v1" ,
api_key = "sk-test-123" ,
http_client = httpx.Client()
)
response = client.chat.completions.create(
model = "gemini-2.5-pro" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
# Check routing decision (implementation-dependent)
print ( f "Provider used: { response.model } " ) # May include prefix
Logs
View routing decisions in logs:
tail -f logs/main.log | grep routing
[INFO] Auto-routing: selected 'geminicli' for model 'gemini-2.5-pro'
[INFO] Provider 'geminicli' quota exceeded, trying fallback
[INFO] Auto-routing: selected 'gemini' for model 'gemini-2.5-pro'
Management API
Query routing decisions:
curl http://localhost:18080/v0/management/analytics \
-H "X-Management-Key: your-secret-key"
{
"routing" : {
"total_requests" : 1000 ,
"provider_usage" : {
"geminicli" : 650 ,
"gemini" : 200 ,
"switchai" : 100 ,
"ollama" : 50
},
"fallback_rate" : 0.15
}
}
Advanced Patterns
Conditional Routing
Route based on request attributes:
def smart_route ( messages , needs_tools = False , needs_vision = False ):
if needs_vision:
return "geminicli:gemini-2.5-pro" # Best vision
elif needs_tools:
return "switchai:auto" # Best tools
elif len ( str (messages)) > 10000 :
return "gemini-2.5-pro" # Large context (auto-route)
else :
return "ollama:llama3.2" # Fast local
response = client.chat.completions.create(
model = smart_route(messages, needs_vision = True ),
messages = messages
)
Time-Based Routing
Route differently based on time of day:
from datetime import datetime
def time_based_route ( model ):
hour = datetime.now().hour
if 9 <= hour <= 17 : # Business hours
# Use paid APIs for better performance
return f "switchai: { model } "
else : # Off-hours
# Use free providers
return model # Auto-route to free providers
response = client.chat.completions.create(
model = time_based_route( "gemini-2.5-pro" ),
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Budget-Based Routing
class BudgetRouter :
def __init__ ( self , daily_budget ):
self .daily_budget = daily_budget
self .spent_today = 0
def route ( self , model ):
if self .spent_today >= self .daily_budget:
# Budget exceeded, use free providers only
return model # Auto-route to geminicli, ollama
else :
# Budget available, allow paid APIs
return f "switchai: { model } "
def record_cost ( self , cost ):
self .spent_today += cost
router = BudgetRouter( daily_budget = 10.0 ) # $10/day
response = client.chat.completions.create(
model = router.route( "gemini-2.5-pro" ),
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
router.record_cost( 0.01 ) # Track usage
Monitoring
Usage Statistics
Track provider usage:
curl http://localhost:18080/v0/management/usage \
-H "X-Management-Key: your-secret-key"
{
"total_requests" : 5000 ,
"by_provider" : {
"geminicli" : {
"requests" : 3000 ,
"tokens" : 1500000 ,
"cost" : 0
},
"gemini" : {
"requests" : 1500 ,
"tokens" : 750000 ,
"cost" : 15.50
},
"ollama" : {
"requests" : 500 ,
"tokens" : 250000 ,
"cost" : 0
}
}
}
Provider Health
Monitor provider availability:
curl http://localhost:18080/v0/management/heartbeat/status \
-H "X-Management-Key: your-secret-key"
{
"providers" : [
{
"id" : "geminicli" ,
"status" : "healthy" ,
"last_check" : "2026-03-09T10:30:00Z" ,
"response_time_ms" : 250 ,
"success_rate" : 0.98
},
{
"id" : "gemini" ,
"status" : "quota_exceeded" ,
"last_check" : "2026-03-09T10:30:00Z" ,
"response_time_ms" : 500 ,
"success_rate" : 0.75
}
]
}
Troubleshooting
No Providers Available
Error : No providers available for model 'gemini-2.5-pro'
Solutions :
Verify providers are configured and authenticated
Check provider status: GET /v1/providers
Try explicit routing: geminicli:gemini-2.5-pro
Check logs for provider initialization errors
All Providers Failing
Error : All providers failed for model 'gemini-2.5-pro'
Solutions :
Check provider health: GET /v0/management/heartbeat/status
Verify API keys are valid
Check quota limits
Try different model: GET /v1/models
Unexpected Provider Used
Issue : Wrong provider selected during auto-routing
Solutions :
Check routing priority: Review config.yaml
Verify provider health: Unhealthy providers are skipped
Use explicit routing: Add provider prefix
Check logs: Review routing decisions
Best Practices
Use Auto-Routing by Default
Start with auto-routing and only use explicit prefixes when needed: # Good: Auto-routing
model = "gemini-2.5-pro"
# Use explicit only when necessary
if needs_cli_features:
model = "geminicli:gemini-2.5-pro"
Use Heartbeat for automatic failover: heartbeat :
enabled : true
interval : 60
Track provider usage to optimize costs: curl http://localhost:18080/v0/management/usage
Next Steps
Provider Prefixes Learn about explicit provider routing
Heartbeat Configure provider health monitoring
Memory Enable success rate optimization
Configuration Configure routing preferences