> ## Documentation Index
> Fetch the complete documentation index at: https://docs.shannon.run/llms.txt
> Use this file to discover all available pages before exploring further.

# Model Selection & Routing

> How Shannon selects models based on tiers, priority, and overrides

## Overview

Shannon automatically selects the optimal LLM model for each task based on:

1. **Task complexity** (analyzed during decomposition)
2. **Explicit tier requests** (`model_tier` parameter)
3. **Model/provider overrides** (`model_override`, `provider_override`)
4. **Priority rankings** (defined in `config/models.yaml`)
5. **Budget constraints** and token limits

This guide explains how model selection works and how to control it.

## Model Tiers

Shannon organizes models into three tiers:

| Tier       | Target Usage | Characteristics                       | Cost Range               |
| ---------- | ------------ | ------------------------------------- | ------------------------ |
| **Small**  | 50%          | Fast, cost-optimized, basic reasoning | \$0.0001-0.0002/1K input |
| **Medium** | 40%          | Balanced capability/cost              | \$0.002-0.006/1K input   |
| **Large**  | 10%          | Heavy reasoning, complex tasks        | \$0.02-0.025/1K input    |

**Note**: Percentages are target distributions, not enforced quotas. Actual usage depends on your workload.

## Selection Flow

<img src="https://mintcdn.com/ptmind-3aa3d4e1/2Bzddmlzr-QTR0Yc/en/tutorials/assets/model-selection-flow.svg?fit=max&auto=format&n=2Bzddmlzr-QTR0Yc&q=85&s=91c077e9b8b903e9ad6e3d2684bb3685" alt="Model Selection Flow" width="850" height="850" data-path="en/tutorials/assets/model-selection-flow.svg" />

## Priority Ranking

Within each tier, models are ranked by priority (lower number = higher priority). Shannon attempts models in priority order until one succeeds.

**Example from `config/models.yaml`**:

```yaml theme={null}
model_tiers:
  small:
    providers:
      - provider: openai
        model: gpt-5-nano-2025-08-07
        priority: 1  # Try first
      - provider: anthropic
        model: claude-haiku-4-5-20251001
        priority: 2  # Fallback if OpenAI fails
      - provider: xai
        model: grok-3-mini
        priority: 3  # Default small tier model for xAI
      - provider: xai
        model: grok-4-fast-non-reasoning
        priority: 4  # Alternative fast model
```

**Fallback Behavior**:

* If priority 1 fails (rate limit, API error), Shannon tries priority 2
* Continues until a model succeeds or all options exhausted
* Failures are logged to orchestrator logs

## Parameter Precedence

When multiple parameters specify model selection, the precedence is:

1. **`model_override`** (highest priority) → Forces specific model
2. **`provider_override`** → Limits to one provider's models
3. **`model_tier`** → Uses requested tier
4. **Auto-detected complexity** (lowest priority) → Default behavior

### Top-Level vs Context Parameters

Top-level parameters **always override** context parameters:

```json theme={null}
{
  "query": "Analyze data",
  "model_tier": "large",           // Top-level (WINS)
  "context": {
    "model_tier": "small"           // Context (IGNORED)
  }
}
```

## Usage Examples

### Auto-Selection (Default)

```bash theme={null}
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{"query": "What is 2+2?"}'
```

Shannon analyzes complexity → Selects small tier → Uses `gpt-5-nano-2025-08-07` (priority 1 in small tier)

### Force Specific Tier

```bash theme={null}
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Complex analysis task",
    "model_tier": "large"
  }'
```

Uses large tier → `gpt-5.1-2025-11-01` (priority 1 in large tier)

### Override to Specific Model

```bash theme={null}
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Analysis",
    "model_override": "claude-sonnet-4-5-20250929"
  }'
```

Forces Anthropic Claude Sonnet, ignoring tier/priority.

### Force Provider

```bash theme={null}
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Analysis",
    "model_tier": "medium",
    "provider_override": "anthropic"
  }'
```

Uses medium tier **but only Anthropic models** → `claude-sonnet-4-5-20250929`

### Python SDK Examples

```python theme={null}
from shannon import ShannonClient

client = ShannonClient(api_key="sk_test_123456")

# Auto-selection
task = client.tasks.submit(query="Simple task")

# Force tier
task = client.tasks.submit(
    query="Complex analysis",
    model_tier="large"
)

# Force model
task = client.tasks.submit(
    query="Research task",
    model_override="gpt-5.1-2025-11-01"
)

# Force provider + tier
task = client.tasks.submit(
    query="Analysis",
    model_tier="medium",
    provider_override="openai"
)
```

## Cost Optimization Strategies

### 1. Start Small, Escalate if Needed

```python theme={null}
# Try small tier first
task = client.tasks.submit(query="Analyze Q4 data", model_tier="small")
status = client.tasks.get(task.task_id)

# If result is insufficient, retry with large
if not_satisfactory(status.result):
    task = client.tasks.submit(query="Analyze Q4 data deeply", model_tier="large")
```

### 2. Provider-Specific Optimization

```python theme={null}
# Use cheaper provider for bulk tasks
for item in bulk_data:
    client.tasks.submit(
        query=f"Summarize {item}",
        model_tier="small",
        provider_override="deepseek"  # Cheaper than OpenAI
    )
```

### 3. Session-Based Escalation

```python theme={null}
session_id = "analysis-session-123"

# Start with small model
client.tasks.submit(
    query="Initial analysis",
    session_id=session_id,
    model_tier="small"
)

# Follow-up with larger model (inherits context)
client.tasks.submit(
    query="Deeper insights",
    session_id=session_id,
    model_tier="large"
)
```

## Research Tiered Model Architecture

<Note>
  Added in Shannon v0.3.0. This feature achieves **50-70% cost reduction** compared to uniform large-model usage for research workflows.
</Note>

Shannon's research workflows automatically assign different model tiers to different stages of execution, using expensive models only where quality matters most.

### How It Works

| Research Stage                    | Default Tier           | Role                   | Rationale                                                                                 |
| --------------------------------- | ---------------------- | ---------------------- | ----------------------------------------------------------------------------------------- |
| **Exploration / Research agents** | `small`                | `research_agent`       | Gathering information is breadth-oriented; small models are sufficient and cost-effective |
| **Synthesis / Final output**      | `large` (configurable) | `synthesis_agent`      | Combining and reasoning over findings requires stronger models                            |
| **Quick strategy**                | `small`                | `quick_research_agent` | Forces parallel execution with small models for maximum speed                             |

### Configuration

The synthesis tier is configurable via `synthesis_model_tier` in your task request or `config/shannon.yaml`:

```yaml theme={null}
# config/shannon.yaml
research:
  default_research_tier: small       # Tier for exploration agents
  synthesis_model_tier: large        # Tier for final synthesis
```

You can also override per-request:

```bash theme={null}
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" \
  -d '{
    "query": "Research the impact of LLM scaling laws",
    "strategy": "research",
    "context": {
      "synthesis_model_tier": "medium"
    }
  }'
```

### Quick Research Strategy

The `quick` strategy forces all research agents to run in parallel using small models, optimizing for speed and cost over depth:

```python theme={null}
task = client.tasks.submit(
    query="Quick overview of recent ML papers",
    strategy="quick"  # Parallel execution, small models, quick_research_agent role
)
```

### Cost Comparison

**Example: 5-agent research workflow**

| Approach        | Model Usage           | Estimated Cost |
| --------------- | --------------------- | -------------- |
| Uniform `large` | 5 x large             | \~\$0.50       |
| Tiered (v0.3.0) | 4 x small + 1 x large | \~\$0.15-0.25  |
| Quick strategy  | 5 x small (parallel)  | \~\$0.05       |

Result: **50-70% cost reduction** with tiered architecture, maintaining synthesis quality.

## Complexity Analysis

Shannon analyzes task complexity using several factors:

* **Query length** and specificity
* **Number of sub-tasks** identified
* **Tool usage** requirements
* **Context depth** needed
* **Reasoning intensity** (keywords like "analyze", "compare", "synthesize")

**Complexity Thresholds** (configurable):

* `< 0.3` → Small tier (simple Q\&A, basic tasks)
* `0.3 - 0.7` → Medium tier (multi-step, moderate reasoning)
* `> 0.7` → Large tier (complex research, heavy reasoning)

## Monitoring & Debugging

### Check Which Model Was Used

```bash theme={null}
TASK_ID="task-abc123"
curl http://localhost:8080/api/v1/tasks/$TASK_ID \
  -H "X-API-Key: sk_test_123456" | jq '{model_used, provider, usage}'
```

Response:

```json theme={null}
{
  "model_used": "gpt-5-nano-2025-08-07",
  "provider": "openai",
  "usage": {
    "total_tokens": 245,
    "input_tokens": 150,
    "output_tokens": 95,
    "estimated_cost": 0.000053
  }
}
```

### Prometheus Metrics

```bash theme={null}
# Model usage by tier
shannon_llm_requests_total{tier="small"}
shannon_llm_requests_total{tier="medium"}
shannon_llm_requests_total{tier="large"}

# Provider distribution
shannon_llm_requests_total{provider="openai"}
shannon_llm_requests_total{provider="anthropic"}

# Tier drift (when requested tier unavailable)
shannon_tier_drift_total{requested="large", actual="medium"}
```

### Orchestrator Logs

```bash theme={null}
docker compose -f deploy/compose/docker-compose.yml logs orchestrator | grep "Model selected"
```

Look for:

* `"Model selected: gpt-5-nano-2025-08-07 (small tier, priority 1)"`
* `"Falling back to priority 2: claude-haiku-4-5-20251001"`
* `"Falling back to priority 3: grok-3-mini (xAI)"`
* `"Tier override: user requested large → using gpt-5.1-2025-11-01"`

## Configuration

Model tiers and priorities are defined in `config/models.yaml`:

```yaml theme={null}
model_tiers:
  small:
    providers:
      - provider: openai
        model: gpt-5-nano-2025-08-07
        priority: 1
      - provider: anthropic
        model: claude-haiku-4-5-20251001
        priority: 2

selection_strategy:
  mode: priority  # priority | round-robin | least-cost
  fallback_enabled: true
  max_retries: 3
```

**Selection Modes**:

* `priority` (default): Try models in priority order
* `round-robin`: Distribute load evenly across same-priority models
* `least-cost`: Always select cheapest model in tier

## Troubleshooting

### Issue: Wrong tier selected

**Symptoms**: Task uses medium tier when you expected small

**Solutions**:

1. Explicitly set `model_tier: "small"` in request
2. Check complexity score in orchestrator logs
3. Verify query isn't triggering complexity heuristics (avoid words like "analyze deeply")

### Issue: Specific model not used

**Symptoms**: Request `model_override: "gpt-5.1-2025-11-01"` but gets different model

**Solutions**:

1. Verify model is in `config/models.yaml` under `model_catalog`
2. Check API key for provider is set in `.env` (e.g., `OPENAI_API_KEY`, `XAI_API_KEY`)
3. Verify model ID uses canonical name (not alias)
4. Check orchestrator logs for fallback messages
5. Ensure the model is available in your API plan (e.g., GPT-5.1 requires appropriate tier)

### Issue: High costs

**Symptoms**: Costs higher than expected

**Solutions**:

1. Check actual tier distribution via Prometheus
2. Add explicit `model_tier: "small"` to requests
3. Review `shannon_tier_drift_total` for unwanted escalations
4. Set `MAX_COST_PER_REQUEST` in `.env` to enforce budget

### Issue: Rate limiting

**Symptoms**: Frequent 429 errors, slow fallback cascade

**Solutions**:

1. Add more providers to tier priority list
2. Enable `round-robin` mode to distribute load
3. Increase `RATE_LIMIT_WINDOW` for affected providers
4. Consider cheaper providers (DeepSeek, Groq) as fallbacks

## Best Practices

1. **Default to Auto-Selection**: Let Shannon's complexity analysis work
2. **Override Sparingly**: Use `model_override` only when required
3. **Start Small**: Set `model_tier: "small"` for cost-sensitive workloads
4. **Monitor Distribution**: Track tier usage via metrics
5. **Configure Fallbacks**: Ensure each tier has 3+ providers
6. **Test Priority Order**: Verify your preferred models are priority 1
7. **Budget Enforcement**: Set `MAX_COST_PER_REQUEST` for safety

## Related Documentation

<CardGroup cols={2}>
  <Card title="Models API" icon="list" href="/en/api/models/overview">
    List available models and pricing
  </Card>

  <Card title="Submit Task" icon="paper-plane" href="/en/api/rest/submit-task">
    Task submission with model parameters
  </Card>

  <Card title="Configuration" icon="gear" href="/en/quickstart/configuration">
    Environment variables and YAML config
  </Card>

  <Card title="Cost Tracking" icon="dollar-sign" href="/en/api/rest/get-status">
    View model usage and costs
  </Card>
</CardGroup>
