Model Selection & Routing

Overview

Shannon automatically selects the optimal LLM model for each task based on:

Task complexity (analyzed during decomposition)
Explicit tier requests (model_tier parameter)
Model/provider overrides (model_override, provider_override)
Priority rankings (defined in config/models.yaml)
Budget constraints and token limits

This guide explains how model selection works and how to control it.

Model Tiers

Shannon organizes models into three tiers:

Tier	Target Usage	Characteristics	Cost Range
Small	50%	Fast, cost-optimized, basic reasoning	$0.0001-0.0002/1K input
Medium	40%	Balanced capability/cost	$0.002-0.006/1K input
Large	10%	Heavy reasoning, complex tasks	$0.02-0.025/1K input

Note: Percentages are target distributions, not enforced quotas. Actual usage depends on your workload.

Selection Flow

Priority Ranking

Within each tier, models are ranked by priority (lower number = higher priority). Shannon attempts models in priority order until one succeeds. Example from config/models.yaml:

model_tiers:
  small:
    providers:
      - provider: openai
        model: gpt-5-nano-2025-08-07
        priority: 1  # Try first
      - provider: anthropic
        model: claude-haiku-4-5-20251001
        priority: 2  # Fallback if OpenAI fails
      - provider: xai
        model: grok-3-mini
        priority: 3  # Default small tier model for xAI
      - provider: xai
        model: grok-4-fast-non-reasoning
        priority: 4  # Alternative fast model

Fallback Behavior:

If priority 1 fails (rate limit, API error), Shannon tries priority 2
Continues until a model succeeds or all options exhausted
Failures are logged to orchestrator logs

Parameter Precedence

When multiple parameters specify model selection, the precedence is:

model_override (highest priority) → Forces specific model
provider_override → Limits to one provider’s models
model_tier → Uses requested tier
Auto-detected complexity (lowest priority) → Default behavior

Top-Level vs Context Parameters

Top-level parameters always override context parameters:

{
  "query": "Analyze data",
  "model_tier": "large",           // Top-level (WINS)
  "context": {
    "model_tier": "small"           // Context (IGNORED)
  }
}

Usage Examples

Auto-Selection (Default)

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{"query": "What is 2+2?"}'

Shannon analyzes complexity → Selects small tier → Uses gpt-5-nano-2025-08-07 (priority 1 in small tier)

Force Specific Tier

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Complex analysis task",
    "model_tier": "large"
  }'

Uses large tier → gpt-5.1-2025-11-01 (priority 1 in large tier)

Override to Specific Model

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Analysis",
    "model_override": "claude-sonnet-4-5-20250929"
  }'

Forces Anthropic Claude Sonnet, ignoring tier/priority.

Force Provider

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Analysis",
    "model_tier": "medium",
    "provider_override": "anthropic"
  }'

Uses medium tier but only Anthropic models → claude-sonnet-4-5-20250929

Python SDK Examples

from shannon import ShannonClient

client = ShannonClient(api_key="sk_test_123456")

# Auto-selection
task = client.tasks.submit(query="Simple task")

# Force tier
task = client.tasks.submit(
    query="Complex analysis",
    model_tier="large"
)

# Force model
task = client.tasks.submit(
    query="Research task",
    model_override="gpt-5.1-2025-11-01"
)

# Force provider + tier
task = client.tasks.submit(
    query="Analysis",
    model_tier="medium",
    provider_override="openai"
)

Cost Optimization Strategies

1. Start Small, Escalate if Needed

# Try small tier first
task = client.tasks.submit(query="Analyze Q4 data", model_tier="small")
status = client.tasks.get(task.task_id)

# If result is insufficient, retry with large
if not_satisfactory(status.result):
    task = client.tasks.submit(query="Analyze Q4 data deeply", model_tier="large")

2. Provider-Specific Optimization

# Use cheaper provider for bulk tasks
for item in bulk_data:
    client.tasks.submit(
        query=f"Summarize {item}",
        model_tier="small",
        provider_override="deepseek"  # Cheaper than OpenAI
    )

3. Session-Based Escalation

session_id = "analysis-session-123"

# Start with small model
client.tasks.submit(
    query="Initial analysis",
    session_id=session_id,
    model_tier="small"
)

# Follow-up with larger model (inherits context)
client.tasks.submit(
    query="Deeper insights",
    session_id=session_id,
    model_tier="large"
)

Research Tiered Model Architecture

Added in Shannon v0.3.0. This feature achieves 50-70% cost reduction compared to uniform large-model usage for research workflows.

Shannon’s research workflows automatically assign different model tiers to different stages of execution, using expensive models only where quality matters most.

How It Works

Research Stage	Default Tier	Role	Rationale
Exploration / Research agents	`small`	`research_agent`	Gathering information is breadth-oriented; small models are sufficient and cost-effective
Synthesis / Final output	`large` (configurable)	`synthesis_agent`	Combining and reasoning over findings requires stronger models
Quick strategy	`small`	`quick_research_agent`	Forces parallel execution with small models for maximum speed

Configuration

The synthesis tier is configurable via synthesis_model_tier in your task request or config/shannon.yaml:

# config/shannon.yaml
research:
  default_research_tier: small       # Tier for exploration agents
  synthesis_model_tier: large        # Tier for final synthesis

You can also override per-request:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Research the impact of LLM scaling laws",
    "strategy": "research",
    "context": {
      "synthesis_model_tier": "medium"
    }
  }'

Quick Research Strategy

The quick strategy forces all research agents to run in parallel using small models, optimizing for speed and cost over depth:

task = client.tasks.submit(
    query="Quick overview of recent ML papers",
    strategy="quick"  # Parallel execution, small models, quick_research_agent role
)

Cost Comparison

Example: 5-agent research workflow

Approach	Model Usage	Estimated Cost
Uniform `large`	5 x large	~$0.50
Tiered (v0.3.0)	4 x small + 1 x large	~$0.15-0.25
Quick strategy	5 x small (parallel)	~$0.05

Result: 50-70% cost reduction with tiered architecture, maintaining synthesis quality.

Complexity Analysis

Shannon analyzes task complexity using several factors:

Query length and specificity
Number of sub-tasks identified
Tool usage requirements
Context depth needed
Reasoning intensity (keywords like “analyze”, “compare”, “synthesize”)

Complexity Thresholds (configurable):

< 0.3 → Small tier (simple Q&A, basic tasks)
0.3 - 0.7 → Medium tier (multi-step, moderate reasoning)
> 0.7 → Large tier (complex research, heavy reasoning)

Monitoring & Debugging

Check Which Model Was Used

TASK_ID="task-abc123"
curl http://localhost:8080/api/v1/tasks/$TASK_ID \
  -H "X-API-Key: sk_test_123456" | jq '{model_used, provider, usage}'

Response:

{
  "model_used": "gpt-5-nano-2025-08-07",
  "provider": "openai",
  "usage": {
    "total_tokens": 245,
    "input_tokens": 150,
    "output_tokens": 95,
    "estimated_cost": 0.000053
  }
}

Prometheus Metrics

# Model usage by tier
shannon_llm_requests_total{tier="small"}
shannon_llm_requests_total{tier="medium"}
shannon_llm_requests_total{tier="large"}

# Provider distribution
shannon_llm_requests_total{provider="openai"}
shannon_llm_requests_total{provider="anthropic"}

# Tier drift (when requested tier unavailable)
shannon_tier_drift_total{requested="large", actual="medium"}

Orchestrator Logs

docker compose -f deploy/compose/docker-compose.yml logs orchestrator | grep "Model selected"

Look for:

"Model selected: gpt-5-nano-2025-08-07 (small tier, priority 1)"
"Falling back to priority 2: claude-haiku-4-5-20251001"
"Falling back to priority 3: grok-3-mini (xAI)"
"Tier override: user requested large → using gpt-5.1-2025-11-01"

Configuration

Model tiers and priorities are defined in config/models.yaml:

model_tiers:
  small:
    providers:
      - provider: openai
        model: gpt-5-nano-2025-08-07
        priority: 1
      - provider: anthropic
        model: claude-haiku-4-5-20251001
        priority: 2

selection_strategy:
  mode: priority  # priority | round-robin | least-cost
  fallback_enabled: true
  max_retries: 3

Selection Modes:

priority (default): Try models in priority order
round-robin: Distribute load evenly across same-priority models
least-cost: Always select cheapest model in tier

Troubleshooting

Issue: Wrong tier selected

Symptoms: Task uses medium tier when you expected small Solutions:

Explicitly set model_tier: "small" in request
Check complexity score in orchestrator logs
Verify query isn’t triggering complexity heuristics (avoid words like “analyze deeply”)

Issue: Specific model not used

Symptoms: Request model_override: "gpt-5.1-2025-11-01" but gets different model Solutions:

Verify model is in config/models.yaml under model_catalog
Check API key for provider is set in .env (e.g., OPENAI_API_KEY, XAI_API_KEY)
Verify model ID uses canonical name (not alias)
Check orchestrator logs for fallback messages
Ensure the model is available in your API plan (e.g., GPT-5.1 requires appropriate tier)

Issue: High costs

Symptoms: Costs higher than expected Solutions:

Check actual tier distribution via Prometheus
Add explicit model_tier: "small" to requests
Review shannon_tier_drift_total for unwanted escalations
Set MAX_COST_PER_REQUEST in .env to enforce budget

Issue: Rate limiting

Symptoms: Frequent 429 errors, slow fallback cascade Solutions:

Add more providers to tier priority list
Enable round-robin mode to distribute load
Increase RATE_LIMIT_WINDOW for affected providers
Consider cheaper providers (DeepSeek, Groq) as fallbacks

Best Practices

Default to Auto-Selection: Let Shannon’s complexity analysis work
Override Sparingly: Use model_override only when required
Start Small: Set model_tier: "small" for cost-sensitive workloads
Monitor Distribution: Track tier usage via metrics
Configure Fallbacks: Ensure each tier has 3+ providers
Test Priority Order: Verify your preferred models are priority 1
Budget Enforcement: Set MAX_COST_PER_REQUEST for safety

Models API

List available models and pricing

Submit Task

Task submission with model parameters

Configuration

Environment variables and YAML config

Cost Tracking

View model usage and costs

Overview

Building Agents

Extension & Integration

Documentation Index

​Overview

​Model Tiers

​Selection Flow

​Priority Ranking

​Parameter Precedence

​Top-Level vs Context Parameters

​Usage Examples

​Auto-Selection (Default)

​Force Specific Tier

​Override to Specific Model

​Force Provider

​Python SDK Examples

​Cost Optimization Strategies

​1. Start Small, Escalate if Needed

​2. Provider-Specific Optimization

​3. Session-Based Escalation

​Research Tiered Model Architecture

​How It Works

​Configuration

​Quick Research Strategy

​Cost Comparison

​Complexity Analysis

​Monitoring & Debugging

​Check Which Model Was Used

​Prometheus Metrics

​Orchestrator Logs

​Configuration

​Troubleshooting

​Issue: Wrong tier selected

​Issue: Specific model not used

​Issue: High costs

​Issue: Rate limiting

​Best Practices

​Related Documentation

Models API

Submit Task

Configuration

Cost Tracking

Overview

Model Tiers

Selection Flow

Priority Ranking

Parameter Precedence

Top-Level vs Context Parameters

Usage Examples

Auto-Selection (Default)

Force Specific Tier

Override to Specific Model

Force Provider

Python SDK Examples

Cost Optimization Strategies

1. Start Small, Escalate if Needed

2. Provider-Specific Optimization

3. Session-Based Escalation

Research Tiered Model Architecture

How It Works

Configuration

Quick Research Strategy

Cost Comparison

Complexity Analysis

Monitoring & Debugging

Check Which Model Was Used

Prometheus Metrics

Orchestrator Logs

Configuration

Troubleshooting

Issue: Wrong tier selected

Issue: Specific model not used

Issue: High costs

Issue: Rate limiting

Best Practices

Related Documentation