Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ail.traylinx.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Routing in switchAILocal determines which credential and provider handle each incoming request. The system supports multiple routing strategies, intelligent fallback, and per-model quota management.

Routing Configuration

Configure routing behavior in config.yaml:
routing:
  # Primary strategy: "round-robin" or "fill-first"
  strategy: "round-robin"
  
  # Optional: Priority list for auto model resolution
  # auto-model-priority:
  #   - "ollama:gpt-oss:120b-cloud"
  #   - "switchai-chat"
  #   - "gemini-2.5-flash"

Routing Strategies

The routing strategy determines how multiple credentials for the same provider are selected.

Round-Robin

Distributes requests evenly across all available credentials.
The RoundRobinSelector (sdk/switchailocal/auth/selector.go) maintains per-model cursors:
type RoundRobinSelector struct {
    mu      sync.Mutex
    cursors map[string]int  // "provider:model" -> cursor
}

func (s *RoundRobinSelector) Pick(..., auths []*Auth) (*Auth, error) {
    key := provider + ":" + model
    s.mu.Lock()
    index := s.cursors[key]
    s.cursors[key] = index + 1
    s.mu.Unlock()
    return available[index % len(available)], nil
}
Behavior:
  1. First request to gpt-4 uses credential A
  2. Second request to gpt-4 uses credential B
  3. Third request to gpt-4 uses credential C
  4. Fourth request to gpt-4 wraps back to credential A
Use case: Distribute load evenly, maximize quota utilization Example:
routing:
  strategy: "round-robin"

codex-api-key:
  - api-key: "sk-proj-A..."
  - api-key: "sk-proj-B..."
  - api-key: "sk-proj-C..."
Round-robin is tracked per model. Requests to gpt-4 and gpt-3.5-turbo maintain independent cursors.

Fill-First

Uses the first available credential until it’s exhausted or in cooldown, then moves to the next.
The FillFirstSelector always picks the first available credential:
type FillFirstSelector struct{}

func (s *FillFirstSelector) Pick(..., auths []*Auth) (*Auth, error) {
    available, err := getAvailableAuths(auths, provider, model, now)
    if err != nil {
        return nil, err
    }
    // Always return first (auths are sorted by ID for consistency)
    return available[0], nil
}
Behavior:
  1. All requests use credential A
  2. When A hits quota → switch to credential B
  3. When B hits quota → switch to credential C
  4. When A recovers → switch back to A
Use case: Stagger subscription caps, optimize for rolling time windows Example:
routing:
  strategy: "fill-first"

claude-api-key:
  - api-key: "sk-ant-primary..."
  - api-key: "sk-ant-backup..."
Fill-first works well with providers that have daily/monthly quotas rather than per-minute rate limits.

Credential Selection Process

The Auth Manager follows a multi-step process to select credentials:

1. Provider Matching

func (m *Manager) Execute(ctx, providers []string, req Request, opts) {
    // Normalize and rotate provider list
    normalized := m.normalizeProviders(providers)
    rotated := m.rotateProviders(req.Model, normalized)
}
Provider names are normalized (lowercased, deduplicated).

2. Model Support Filtering

for _, candidate := range m.auths {
    if candidate.Provider != provider || candidate.Disabled {
        continue
    }
    // Check model registry
    if !registryRef.ClientSupportsModel(candidate.ID, modelKey) {
        continue
    }
    candidates = append(candidates, candidate)
}
Credentials are filtered based on model support from the registry.

3. Status Filtering

func getAvailableAuths(auths, provider, model, now) ([]*Auth, error) {
    available, cooldownCount, earliest := collectAvailable(auths, model, now)
    
    if len(available) == 0 {
        if cooldownCount == len(auths) && !earliest.IsZero() {
            resetIn := earliest.Sub(now)
            return nil, newModelCooldownError(model, provider, resetIn)
        }
        return nil, &Error{Code: "auth_unavailable"}
    }
    return available, nil
}
Checks:
  • Not disabled (auth.Disabled == false)
  • Not unavailable (auth.Unavailable == false)
  • Past retry time (auth.NextRetryAfter < now)
  • Model-specific state (if tracked)

4. Strategy Application

The selected strategy picks one credential from the available pool:
auth, err := m.selector.Pick(ctx, provider, model, opts, candidates)

Multi-Provider Routing

You can specify multiple providers for the same model:
routing:
  strategy: "round-robin"

# Same model available from multiple providers
codex-api-key:
  - api-key: "sk-proj-openai..."
    models:
      - name: "gpt-4o"

openai-compatibility:
  - name: "openrouter"
    prefix: "or"
    api-key-entries:
      - api-key: "sk-or-v1..."
    # Also provides gpt-4o
When both providers support gpt-4o:
  1. Manager tries OpenAI provider first
  2. If OpenAI is in cooldown → tries OpenRouter
  3. Rotates starting provider on next request
Per-model provider rotation ensures even distribution when multiple providers offer the same model.

Intelligent Routing (Cortex Phase 2)

When Intelligence is enabled, routing becomes content-aware:
intelligence:
  enabled: true
  router-model: "ollama:gpt-oss:20b-cloud"
  
  matrix:
    coding: "switchai-chat"
    reasoning: "switchai-reasoner"
    fast: "switchai-fast"
    secure: "ollama:llama3.2"
    vision: "ollama:qwen3-vl:235b-instruct-cloud"
  
  semantic-tier:
    enabled: true
    confidence-threshold: 0.85

Classification Flow

The Intelligence Service uses the router model to classify requests:
type Classification struct {
    Intent     string   // "coding", "reasoning", "fast", etc.
    Confidence float64  // 0.0 to 1.0
    Model      string   // Resolved model from matrix
}

func (s *Service) Classify(ctx, req) (*Classification, error) {
    // 1. Check semantic cache
    // 2. Try semantic matching with embeddings
    // 3. Fall back to LLM classification
    // 4. Apply confidence thresholds
}
Intent mapping:
  • coding: Code generation, debugging, refactoring
  • reasoning: Complex problem-solving, math, logic
  • fast: Simple queries, casual conversation
  • secure: Privacy-sensitive, runs locally only
  • vision: Image analysis, OCR, visual tasks

Quota Management

Quota tracking prevents retry storms when providers hit rate limits.

Quota States

Each credential tracks quota status:
type QuotaState struct {
    Exceeded      bool      // Currently over quota
    Reason        string    // "quota", "rate_limit", etc.
    NextRecoverAt time.Time // When quota resets
    BackoffLevel  int       // Exponential backoff level
}

Backoff Schedule

Exponential backoff prevents hammering rate-limited providers:
func nextQuotaCooldown(prevLevel int) (time.Duration, int) {
    cooldown := quotaBackoffBase * time.Duration(1<<prevLevel)
    // Level 0: 1 second
    // Level 1: 2 seconds
    // Level 2: 4 seconds
    // Level 3: 8 seconds
    // ...
    // Max: 30 minutes
    if cooldown >= quotaBackoffMax {
        return quotaBackoffMax, prevLevel
    }
    return cooldown, prevLevel + 1
}
Set quota-exceeded.switch-project: true to automatically switch to another credential when quota is hit.

Model-Level Quotas

Quotas are tracked per-model for fine-grained control:
# API key exhausted gpt-4 quota but gpt-3.5-turbo still works
type Auth struct {
    ModelStates map[string]*ModelState
}

type ModelState struct {
    Quota QuotaState  // Per-model quota tracking
}
Behavior:
# Request 1: gpt-4 with key-A → Success
# Request 2: gpt-4 with key-A → 429 Too Many Requests
# Request 3: gpt-4 with key-A → Skipped (in cooldown)
# Request 4: gpt-4 with key-B → Success (different key)
# Request 5: gpt-3.5-turbo with key-A → Success (different model)

Retry Logic

Configurable retry behavior for transient failures:
request-retry: 3  # Retry up to 3 times

streaming:
  bootstrap-retries: 2  # Retries before first byte
func (m *Manager) shouldRetryAfterError(err error, attempt, maxAttempts int, 
                                        providers []string, model string, 
                                        maxWait time.Duration) (time.Duration, bool) {
    // No retry on last attempt
    if attempt >= maxAttempts-1 {
        return 0, false
    }
    
    // Check if any credential will recover soon
    wait, found := m.closestCooldownWait(providers, model)
    if !found || wait > maxWait {
        return 0, false
    }
    
    return wait, true
}
Retry conditions:
  • Not the final attempt
  • At least one credential will recover within maxWait
  • Error is retryable (408, 429, 500, 502, 503, 504)

Fallback Chains

Automatic fallback when quota is exceeded:
quota-exceeded:
  switch-project: true       # Try next credential
  switch-preview-model: true # Try preview/alternative models
Fallback order:
  1. Try next credential for same model
  2. Try preview model with same credential
  3. Try preview model with next credential
  4. Return cooldown error

Load Balancing

Distribute requests across providers:
openai-compatibility:
  # Multiple providers for same models
  - name: "groq"
    prefix: "groq"
    api-key-entries:
      - api-key: "gsk-A..."
      - api-key: "gsk-B..."
  
  - name: "openrouter"  
    prefix: "or"
    api-key-entries:
      - api-key: "sk-or-v1-A..."
      - api-key: "sk-or-v1-B..."
With round-robin strategy:
  • Request 1 → groq key-A
  • Request 2 → groq key-B
  • Request 3 → openrouter key-A
  • Request 4 → openrouter key-B
  • Request 5 → groq key-A (rotation)

Custom Selectors

Implement custom routing logic:
type MyCustomSelector struct {
    // Your state
}

func (s *MyCustomSelector) Pick(ctx context.Context, 
                                provider, model string,
                                opts executor.Options,
                                auths []*Auth) (*Auth, error) {
    // Filter by metadata
    for _, auth := range auths {
        if region, ok := auth.Metadata["region"].(string); ok {
            if region == "us-west" {
                return auth, nil
            }
        }
    }
    // Fallback to first
    return auths[0], nil
}

// Register
service.CoreManager().SetSelector(&MyCustomSelector{})
Custom selectors receive only available credentials (already filtered by status and cooldown).

Next Steps

Authentication

Learn about credential lifecycle and refresh

Providers

Configure provider-specific settings

Intelligence

Enable semantic routing and classification

Configuration

Complete routing configuration reference