Auto-Routing - switchAILocal

Overview

Auto-routing enables switchAILocal to automatically select the best available provider for your request. Omit the provider prefix from your model name to activate intelligent routing.

How It Works

Basic Auto-Routing

Simply use the model name without a provider prefix:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123"
)

# Auto-routing: no provider prefix
response = client.chat.completions.create(
    model="gemini-2.5-pro",  # Not "geminicli:gemini-2.5-pro"
    messages=[{"role": "user", "content": "Hello!"}]
)

Routing Algorithm

switchAILocal evaluates providers in this order:

Provider Availability: Check if provider supports the model
Provider Health: Skip unhealthy or quota-exceeded providers
Priority Order: Follow configured priority preferences
Cost Optimization: Prefer CLI and local providers (free)
Success Rate: Favor providers with better historical performance
Fallback: Try alternative providers if primary fails

Routing Priority

Default Priority

By default, switchAILocal prioritizes in this order:

CLI Providers (geminicli:, claudecli:, etc.) - Uses your paid subscriptions
Local Providers (ollama:, lmstudio:) - Free and private
switchAI (switchai:) - Unified gateway with auto-selection
API Providers (gemini:, claude:, etc.) - Direct API access

Custom Priority

Override the default priority in config.yaml:

config.yaml

routing:
  priority:
    - ollama      # Try local models first
    - geminicli   # Then CLI providers
    - switchai    # Then switchAI
    - gemini      # Finally APIs

Intelligent Features

Health-Based Routing

switchAILocal monitors provider health and automatically routes away from failing providers:

config.yaml

heartbeat:
  enabled: true
  interval: 60  # Check every 60 seconds
  providers:
    - geminicli
    - ollama
    - switchai

Unhealthy providers are automatically skipped during routing.

Quota-Aware Routing

When a provider exceeds quota, switchAILocal automatically fails over:

# First request succeeds with geminicli
response1 = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: geminicli

# If geminicli hits quota, automatically switches
response2 = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: gemini (API fallback)

Success Rate Optimization

With Memory system enabled, switchAILocal learns which providers perform best:

config.yaml

memory:
  enabled: true
  provider_selection:
    enabled: true
    min_samples: 10  # Learn after 10 requests

Providers with higher success rates are preferred in future requests.

Model Mapping

Automatically map unavailable models to alternatives:

config.yaml

ampcode:
  model_mappings:
    - from: "gpt-5"
      to: "gemini-2.5-pro"
      regex: false
    - from: "claude-opus-4"
      to: "claude-sonnet-4"
      regex: false

Requests for unavailable models are automatically redirected.

Configuration

Enable Auto-Routing

config.yaml

routing:
  auto_routing: true  # Default: true
  fallback_enabled: true  # Try alternatives on failure
  max_retries: 3  # Retry attempts per provider

Provider Weights

Assign weights to providers for load distribution:

config.yaml

routing:
  weights:
    geminicli: 0.5    # 50% of requests
    ollama: 0.3       # 30% of requests
    switchai: 0.2     # 20% of requests

Exclude Providers

Exclude specific providers from auto-routing:

config.yaml

routing:
  exclude:
    - expensive-provider
    - slow-provider

Examples

Cost-Optimized Routing

config.yaml

routing:
  priority:
    - ollama      # Free local models
    - geminicli   # Free CLI (with subscription)
    - switchai    # Paid unified gateway

# Automatically uses cheapest available provider
response = client.chat.completions.create(
    model="llama3.2",  # Available in Ollama
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: ollama (free)

Performance-Optimized Routing

config.yaml

routing:
  priority:
    - switchai    # Fast cloud API
    - gemini      # Fast Google API
    - geminicli   # Slower CLI
    - ollama      # Depends on hardware

Privacy-Optimized Routing

config.yaml

routing:
  priority:
    - ollama      # Fully local
    - geminicli   # Local CLI execution
  exclude:
    - switchai    # Cloud service
    - gemini      # Cloud service
    - claude      # Cloud service

Hybrid Strategy

Combine local and cloud for best of both:

def route_by_task(task_type):
    if task_type == "simple":
        # Use local for simple tasks
        return "llama3.2"  # Routed to Ollama
    elif task_type == "complex":
        # Use cloud for complex tasks
        return "gemini-2.5-pro"  # Routed to best Gemini provider
    elif task_type == "coding":
        # Use CLI for coding (supports attachments)
        return "geminicli:gemini-2.5-pro"

response = client.chat.completions.create(
    model=route_by_task("complex"),
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

Routing Transparency

Response Headers

Check which provider was used via response headers:

from openai import OpenAI
import httpx

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
    http_client=httpx.Client()
)

response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Check routing decision (implementation-dependent)
print(f"Provider used: {response.model}")  # May include prefix

Logs

View routing decisions in logs:

tail -f logs/main.log | grep routing

[INFO] Auto-routing: selected 'geminicli' for model 'gemini-2.5-pro'
[INFO] Provider 'geminicli' quota exceeded, trying fallback
[INFO] Auto-routing: selected 'gemini' for model 'gemini-2.5-pro'

Management API

Query routing decisions:

curl http://localhost:18080/v0/management/analytics \
  -H "X-Management-Key: your-secret-key"

{
  "routing": {
    "total_requests": 1000,
    "provider_usage": {
      "geminicli": 650,
      "gemini": 200,
      "switchai": 100,
      "ollama": 50
    },
    "fallback_rate": 0.15
  }
}

Advanced Patterns

Conditional Routing

Route based on request attributes:

def smart_route(messages, needs_tools=False, needs_vision=False):
    if needs_vision:
        return "geminicli:gemini-2.5-pro"  # Best vision
    elif needs_tools:
        return "switchai:auto"  # Best tools
    elif len(str(messages)) > 10000:
        return "gemini-2.5-pro"  # Large context (auto-route)
    else:
        return "ollama:llama3.2"  # Fast local

response = client.chat.completions.create(
    model=smart_route(messages, needs_vision=True),
    messages=messages
)

Time-Based Routing

Route differently based on time of day:

from datetime import datetime

def time_based_route(model):
    hour = datetime.now().hour
    
    if 9 <= hour <= 17:  # Business hours
        # Use paid APIs for better performance
        return f"switchai:{model}"
    else:  # Off-hours
        # Use free providers
        return model  # Auto-route to free providers

response = client.chat.completions.create(
    model=time_based_route("gemini-2.5-pro"),
    messages=[{"role": "user", "content": "Hello!"}]
)

Budget-Based Routing

class BudgetRouter:
    def __init__(self, daily_budget):
        self.daily_budget = daily_budget
        self.spent_today = 0
    
    def route(self, model):
        if self.spent_today >= self.daily_budget:
            # Budget exceeded, use free providers only
            return model  # Auto-route to geminicli, ollama
        else:
            # Budget available, allow paid APIs
            return f"switchai:{model}"
    
    def record_cost(self, cost):
        self.spent_today += cost

router = BudgetRouter(daily_budget=10.0)  # $10/day

response = client.chat.completions.create(
    model=router.route("gemini-2.5-pro"),
    messages=[{"role": "user", "content": "Hello!"}]
)

router.record_cost(0.01)  # Track usage

Monitoring

Usage Statistics

Track provider usage:

curl http://localhost:18080/v0/management/usage \
  -H "X-Management-Key: your-secret-key"

{
  "total_requests": 5000,
  "by_provider": {
    "geminicli": {
      "requests": 3000,
      "tokens": 1500000,
      "cost": 0
    },
    "gemini": {
      "requests": 1500,
      "tokens": 750000,
      "cost": 15.50
    },
    "ollama": {
      "requests": 500,
      "tokens": 250000,
      "cost": 0
    }
  }
}

Provider Health

Monitor provider availability:

curl http://localhost:18080/v0/management/heartbeat/status \
  -H "X-Management-Key: your-secret-key"

{
  "providers": [
    {
      "id": "geminicli",
      "status": "healthy",
      "last_check": "2026-03-09T10:30:00Z",
      "response_time_ms": 250,
      "success_rate": 0.98
    },
    {
      "id": "gemini",
      "status": "quota_exceeded",
      "last_check": "2026-03-09T10:30:00Z",
      "response_time_ms": 500,
      "success_rate": 0.75
    }
  ]
}

Troubleshooting

No Providers Available

Error: No providers available for model 'gemini-2.5-pro' Solutions:

Verify providers are configured and authenticated
Check provider status: GET /v1/providers
Try explicit routing: geminicli:gemini-2.5-pro
Check logs for provider initialization errors

All Providers Failing

Error: All providers failed for model 'gemini-2.5-pro' Solutions:

Check provider health: GET /v0/management/heartbeat/status
Verify API keys are valid
Check quota limits
Try different model: GET /v1/models

Unexpected Provider Used

Issue: Wrong provider selected during auto-routing Solutions:

Check routing priority: Review config.yaml
Verify provider health: Unhealthy providers are skipped
Use explicit routing: Add provider prefix
Check logs: Review routing decisions

Best Practices

Use Auto-Routing by Default

Start with auto-routing and only use explicit prefixes when needed:

# Good: Auto-routing
model = "gemini-2.5-pro"

# Use explicit only when necessary
if needs_cli_features:
    model = "geminicli:gemini-2.5-pro"

Configure Priorities

Set priorities based on your preferences:

routing:
  priority:
    - geminicli  # Free with subscription
    - ollama     # Free local
    - switchai   # Paid unified

Enable Health Monitoring

Use Heartbeat for automatic failover:

heartbeat:
  enabled: true
  interval: 60

Monitor Usage

Track provider usage to optimize costs:

curl http://localhost:18080/v0/management/usage

Next Steps

Provider Prefixes

Learn about explicit provider routing

Heartbeat

Configure provider health monitoring

Memory

Enable success rate optimization

Configuration

Configure routing preferences

Overview

Endpoints

Provider Formats

Management API

Documentation Index

​Overview

​How It Works

​Basic Auto-Routing

​Routing Algorithm

​Routing Priority

​Default Priority

​Custom Priority

​Intelligent Features

​Health-Based Routing

​Quota-Aware Routing

​Success Rate Optimization

​Model Mapping

​Configuration

​Enable Auto-Routing

​Provider Weights

​Exclude Providers

​Examples

​Cost-Optimized Routing

​Performance-Optimized Routing

​Privacy-Optimized Routing

​Hybrid Strategy

​Routing Transparency

​Response Headers

​Logs

​Management API

​Advanced Patterns

​Conditional Routing

​Time-Based Routing

​Budget-Based Routing

​Monitoring

​Usage Statistics

​Provider Health

​Troubleshooting

​No Providers Available

​All Providers Failing

​Unexpected Provider Used

​Best Practices

​Next Steps

Provider Prefixes

Heartbeat

Memory

Configuration

Overview

How It Works

Basic Auto-Routing

Routing Algorithm

Routing Priority

Default Priority

Custom Priority

Intelligent Features

Health-Based Routing

Quota-Aware Routing

Success Rate Optimization

Model Mapping

Configuration

Enable Auto-Routing

Provider Weights

Exclude Providers

Examples

Cost-Optimized Routing

Performance-Optimized Routing

Privacy-Optimized Routing

Hybrid Strategy

Routing Transparency

Response Headers

Logs

Management API

Advanced Patterns

Conditional Routing

Time-Based Routing

Budget-Based Routing

Monitoring

Usage Statistics

Provider Health

Troubleshooting

No Providers Available

All Providers Failing

Unexpected Provider Used

Best Practices

Next Steps