Overview
Shannon automatically selects the optimal LLM model for each task based on:- Task complexity (analyzed during decomposition)
- Explicit tier requests (
model_tierparameter) - Model/provider overrides (
model_override,provider_override) - Priority rankings (defined in
config/models.yaml) - Budget constraints and token limits
Model Tiers
Shannon organizes models into three tiers:| Tier | Target Usage | Characteristics | Cost Range |
|---|---|---|---|
| Small | 50% | Fast, cost-optimized, basic reasoning | $0.0001-0.0002/1K input |
| Medium | 40% | Balanced capability/cost | $0.002-0.006/1K input |
| Large | 10% | Heavy reasoning, complex tasks | $0.02-0.025/1K input |
Selection Flow
Priority Ranking
Within each tier, models are ranked by priority (lower number = higher priority). Shannon attempts models in priority order until one succeeds. Example fromconfig/models.yaml:
- If priority 1 fails (rate limit, API error), Shannon tries priority 2
- Continues until a model succeeds or all options exhausted
- Failures are logged to orchestrator logs
Parameter Precedence
When multiple parameters specify model selection, the precedence is:model_override(highest priority) → Forces specific modelprovider_override→ Limits to one provider’s modelsmodel_tier→ Uses requested tier- Auto-detected complexity (lowest priority) → Default behavior
Top-Level vs Context Parameters
Top-level parameters always override context parameters:Usage Examples
Auto-Selection (Default)
gpt-5-nano-2025-08-07 (priority 1 in small tier)
Force Specific Tier
gpt-5.1-2025-11-01 (priority 1 in large tier)
Override to Specific Model
Force Provider
claude-sonnet-4-5-20250929
Python SDK Examples
Cost Optimization Strategies
1. Start Small, Escalate if Needed
2. Provider-Specific Optimization
3. Session-Based Escalation
Research Tiered Model Architecture
Added in Shannon v0.3.0. This feature achieves 50-70% cost reduction compared to uniform large-model usage for research workflows.
How It Works
| Research Stage | Default Tier | Role | Rationale |
|---|---|---|---|
| Exploration / Research agents | small | research_agent | Gathering information is breadth-oriented; small models are sufficient and cost-effective |
| Synthesis / Final output | large (configurable) | synthesis_agent | Combining and reasoning over findings requires stronger models |
| Quick strategy | small | quick_research_agent | Forces parallel execution with small models for maximum speed |
Configuration
The synthesis tier is configurable viasynthesis_model_tier in your task request or config/shannon.yaml:
Quick Research Strategy
Thequick strategy forces all research agents to run in parallel using small models, optimizing for speed and cost over depth:
Cost Comparison
Example: 5-agent research workflow| Approach | Model Usage | Estimated Cost |
|---|---|---|
Uniform large | 5 x large | ~$0.50 |
| Tiered (v0.3.0) | 4 x small + 1 x large | ~$0.15-0.25 |
| Quick strategy | 5 x small (parallel) | ~$0.05 |
Complexity Analysis
Shannon analyzes task complexity using several factors:- Query length and specificity
- Number of sub-tasks identified
- Tool usage requirements
- Context depth needed
- Reasoning intensity (keywords like “analyze”, “compare”, “synthesize”)
< 0.3→ Small tier (simple Q&A, basic tasks)0.3 - 0.7→ Medium tier (multi-step, moderate reasoning)> 0.7→ Large tier (complex research, heavy reasoning)
Monitoring & Debugging
Check Which Model Was Used
Prometheus Metrics
Orchestrator Logs
"Model selected: gpt-5-nano-2025-08-07 (small tier, priority 1)""Falling back to priority 2: claude-haiku-4-5-20251001""Falling back to priority 3: grok-3-mini (xAI)""Tier override: user requested large → using gpt-5.1-2025-11-01"
Configuration
Model tiers and priorities are defined inconfig/models.yaml:
priority(default): Try models in priority orderround-robin: Distribute load evenly across same-priority modelsleast-cost: Always select cheapest model in tier
Troubleshooting
Issue: Wrong tier selected
Symptoms: Task uses medium tier when you expected small Solutions:- Explicitly set
model_tier: "small"in request - Check complexity score in orchestrator logs
- Verify query isn’t triggering complexity heuristics (avoid words like “analyze deeply”)
Issue: Specific model not used
Symptoms: Requestmodel_override: "gpt-5.1-2025-11-01" but gets different model
Solutions:
- Verify model is in
config/models.yamlundermodel_catalog - Check API key for provider is set in
.env(e.g.,OPENAI_API_KEY,XAI_API_KEY) - Verify model ID uses canonical name (not alias)
- Check orchestrator logs for fallback messages
- Ensure the model is available in your API plan (e.g., GPT-5.1 requires appropriate tier)
Issue: High costs
Symptoms: Costs higher than expected Solutions:- Check actual tier distribution via Prometheus
- Add explicit
model_tier: "small"to requests - Review
shannon_tier_drift_totalfor unwanted escalations - Set
MAX_COST_PER_REQUESTin.envto enforce budget
Issue: Rate limiting
Symptoms: Frequent 429 errors, slow fallback cascade Solutions:- Add more providers to tier priority list
- Enable
round-robinmode to distribute load - Increase
RATE_LIMIT_WINDOWfor affected providers - Consider cheaper providers (DeepSeek, Groq) as fallbacks
Best Practices
- Default to Auto-Selection: Let Shannon’s complexity analysis work
- Override Sparingly: Use
model_overrideonly when required - Start Small: Set
model_tier: "small"for cost-sensitive workloads - Monitor Distribution: Track tier usage via metrics
- Configure Fallbacks: Ensure each tier has 3+ providers
- Test Priority Order: Verify your preferred models are priority 1
- Budget Enforcement: Set
MAX_COST_PER_REQUESTfor safety
Related Documentation
Models API
List available models and pricing
Submit Task
Task submission with model parameters
Configuration
Environment variables and YAML config
Cost Tracking
View model usage and costs