> ## Documentation Index
> Fetch the complete documentation index at: https://docs.shannon.run/llms.txt
> Use this file to discover all available pages before exploring further.

# Cost Control

> Managing token budgets and minimizing LLM costs in Shannon

## Overview

Shannon provides comprehensive cost control features to prevent unexpected LLM charges and optimize spending. With built-in budget enforcement and intelligent routing, teams often see **significant cost reductions (60–90%)** versus naive implementations (workload‑dependent).

## Setting Budgets

Budgets are configured at the platform level (not per request via REST). Use environment variables in `.env`:

```bash theme={null}
# LLM service budget guards
MAX_TOKENS_PER_REQUEST=10000    # Max tokens per request
MAX_COST_PER_REQUEST=0.50       # Max cost per request (USD)

# Apply changes
docker compose restart
```

<Note>
  Shannon enforces budgets during execution. When limits are reached, the system halts further spending and returns the best available result or an error depending on context.
</Note>

## Model Tiers

Shannon categorizes models into tiers based on capability and cost:

| Tier       | Models                        | Cost per 1M tokens | Use Case                          |
| ---------- | ----------------------------- | ------------------ | --------------------------------- |
| **SMALL**  | gpt-5-nano<br />claude-haiku  | $0.15 - $0.25      | Simple queries, high volume       |
| **MEDIUM** | gpt-5-mini<br />claude-sonnet | $3.00 - $15.00     | General purpose tasks             |
| **LARGE**  | gpt-5.1<br />claude-opus      | $15.00 - $75.00    | Complex reasoning, critical tasks |

### Explicit Tier Preference

Set a preferred default tier via environment variable:

```bash theme={null}
DEFAULT_MODEL_TIER=small   # small | medium | large
```

## Intelligent Router

Shannon's learning router automatically selects the cheapest model capable of handling each task.

### How It Works

1. **Task Analysis**: Analyzes complexity, required capabilities
2. **Model Selection**: Starts with smallest viable model
3. **Quality Check**: Validates output quality
4. **Learning**: Remembers successful model-task pairings

### Cost Savings

```python theme={null}
# Without intelligent routing
Traditional: Always use GPT-5 → $0.50 per task

# With Shannon's routing
Shannon:
  - 70% routed to gpt-5-nano → $0.01
  - 25% routed to gpt-5-mini → $0.15
  - 5% routed to gpt-5.1 → $0.50
Example average: $0.05 per task (~90% savings)
```

### Monitoring Router Decisions

Use the dashboard or SDK status to review costs:

```python theme={null}
status = client.wait(handle.task_id)
if status.token_usage:
    print(f"Cost (USD): {status.token_usage.cost_usd:.4f}")
```

## Response Caching

Shannon caches LLM responses to eliminate redundant API calls:

### Cache Strategy

* **Key**: SHA256 hash of `(messages + model + parameters)`
* **TTL**: Configurable (often \~1 hour via Redis TTL)
* **Storage**: In-memory LRU + optional Redis for distributed caching
* **Hit Rate**: Typical 30-50% for production workloads

### Cache Benefits

```bash theme={null}
# First call: Cache miss
Task 1: "What is Python?" → $0.002 (LLM call)

# Second call: Cache hit
Task 2: "What is Python?" → $0.000 (cached)
```

### Monitoring Cache Performance

```python theme={null}
status = client.wait(handle.task_id)
if status.token_usage:
    if status.token_usage.cost_usd == 0:
        print("Likely served from cache (no LLM cost)")
    else:
        print(f"Cost: ${status.token_usage.cost_usd:.4f}")
```

## Provider Rate Limits

Shannon respects provider rate limits automatically:

### Configured Limits

From `config/models.yaml`:

```yaml theme={null}
providers:
  openai:
    rpm: 10000  # Requests per minute
    tpm: 2000000  # Tokens per minute
  anthropic:
    rpm: 4000
    tpm: 400000
```

### Automatic Throttling

When approaching limits:

1. Queues requests
2. Spreads load over time
3. Falls back to alternative providers if available

## Cost Monitoring

### Track Spending Per Task

```python theme={null}
status = client.wait(handle.task_id)
if status.token_usage:
    print(f"Tokens used: {status.token_usage.total_tokens}")
    print(f"Cost: ${status.token_usage.cost_usd:.4f}")
```

### Aggregate Metrics

Shannon tracks cumulative costs via metrics pipelines:

* Total spend by day/week/month
* Cost per user/team
* Cost per cognitive pattern
* Token usage trends

Use Prometheus/Grafana (once configured) or your observability stack to visualize these metrics. The Desktop App Runs view can also help you inspect per-task costs.

## Best Practices

### 1. Always Set Budgets

Never run production tasks without budget limits:

```python theme={null}
# ❌ Bad: No budget limits
client.submit_task(query="...")

# ✅ Good: Budget protection
client.submit_task(
    query="...",
    # Budget configured via .env}
)
```

### 2. Use Simple Mode When Possible

Complex patterns cost more:

```python theme={null}
# Simple query: Use simple mode
client.submit_task(
    query="What is the capital of France?",
    # Mode auto-selected  # Single agent, minimal tokens
)

# Complex query: Use standard/complex mode
client.submit_task(
    query="Research and compare 5 database technologies",
    # Mode auto-selected  # Task decomposition justified
)
```

### 3. Leverage Caching

For repeated queries, use consistent phrasing to maximize cache hits:

```python theme={null}
# ❌ Bad: Different phrasing prevents cache hits
client.submit_task(query="What's Python?")
client.submit_task(query="Tell me about Python")
client.submit_task(query="Explain Python")

# ✅ Good: Consistent queries hit cache
standard_query = "What is Python?"
client.submit_task(query=standard_query)  # Cache miss
client.submit_task(query=standard_query)  # Cache hit
client.submit_task(query=standard_query)  # Cache hit
```

### 4. Monitor and Optimize

Review cost metrics regularly:

```python theme={null}
# Enable detailed logging
import logging
logging.basicConfig(level=logging.INFO)

total_cost = 0.0
for t in tasks:
    st = client.wait(t.task_id)
    if st.token_usage:
        total_cost += st.token_usage.cost_usd
        print(f"Task: ${st.token_usage.cost_usd:.4f}, Running total: ${total_cost:.4f}")
```

### 5. Use Smaller Models First

Let the intelligent router prove when larger models are needed:

```python theme={null}
# Let Shannon choose
client.submit_task(query="...")  # Auto-selects tier

# Model tier is selected by the platform router. No per-request model tier or budget parameters are accepted by the SDK.
```

## Cost Optimization Checklist

<Accordion title="Optimization Checklist">
  * [ ] Set budget env vars (`MAX_COST_PER_REQUEST`, `MAX_TOKENS_PER_REQUEST`)
  * [ ] Use `# Mode auto-selected` for straightforward queries
  * [ ] Enable response caching (default: enabled)
  * [ ] Use `model_tier="small"` when appropriate
  * [ ] Standardize query phrasing for cache hits
  * [ ] Monitor cost metrics in dashboard
  * [ ] Set up budget alerts (via Prometheus)
  * [ ] Review and optimize prompt templates
  * [ ] Use session context to reduce token usage
  * [ ] Enable learning router (default: enabled)
</Accordion>

## Example: Cost-Optimized Workflow

```python theme={null}
from shannon import ShannonClient

client = ShannonClient()

# High-volume, simple queries
simple_tasks = [
    "Classify sentiment: Great product!",
    "Classify sentiment: Terrible experience",
    "Classify sentiment: It's okay"
]

total_cost = 0
for query in simple_tasks:
    handle = client.submit_task(query=query)
    status = client.wait(handle.task_id)
    if status.token_usage:
        total_cost += status.token_usage.cost_usd

print(f"Total cost for 3 tasks: ${total_cost:.4f}")
# Example: ~$0.006 (~90% savings vs always GPT-5)
```

## Next Steps

<CardGroup cols={2}>
  <Card title="Streaming Events" icon="stream" href="/en/quickstart/concepts/streaming">
    Real-time task monitoring
  </Card>

  <Card title="Configuration" icon="gear" href="/en/quickstart/configuration">
    Advanced cost settings
  </Card>

  <Card title="API Overview" icon="code" href="/en/api/overview">
    Endpoints and usage
  </Card>

  <Card title="Monitoring" icon="chart-line" href="/en/quickstart/concepts/monitoring">
    Cost monitoring UI
  </Card>
</CardGroup>
