Documentation Index
Fetch the complete documentation index at: https://docs.shannon.run/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
Description
Returns all models currently configured in Shannon, organized by provider. This endpoint queries the Python LLM service directly and reflects the models defined inconfig/models.yaml.
Authentication
Required: No (internal service endpoint) For production deployments, access should be restricted to internal networks only.Request
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
tier | string | No | Filter by tier: small, medium, or large |
Headers
None required for internal access.Response
Success Response
Status:200 OK
Body:
Response Structure
Response is organized by provider, with each provider returning an array of model objects:| Field | Type | Description |
|---|---|---|
id | string | Model identifier (canonical name) |
name | string | Display name (same as id) |
tier | string | Size tier: small, medium, or large |
context_window | integer | Maximum context length in tokens |
cost_per_1k_prompt_tokens | float | Cost per 1K input tokens (USD) |
cost_per_1k_completion_tokens | float | Cost per 1K output tokens (USD) |
supports_tools | boolean | Function calling support |
supports_streaming | boolean | Real-time streaming support |
available | boolean | Currently available for use |
Examples
List All Models
Filter by Tier
Python Example
Model Tiers
Models are organized into three tiers based on capability and cost:Small Tier (Priority for 50% of workload)
Fast, cost-optimized models for basic tasks:- OpenAI: gpt-5-nano-2025-08-07
- Anthropic: claude-haiku-4-5-20251001
- xAI: grok-3-mini
- Google: gemini-2.5-flash-lite
- DeepSeek: deepseek-chat
Medium Tier (Priority for 40% of workload)
Balanced capability/cost models:- OpenAI: gpt-5-mini-2025-08-07
- Anthropic: claude-sonnet-4-5-20250929
- xAI: grok-4-fast-non-reasoning
- Google: gemini-2.5-flash
- Meta: llama-4-scout
Large Tier (Priority for 10% of workload)
Heavy reasoning models for complex tasks:- OpenAI: gpt-4.1-2025-04-14, gpt-5-pro-2025-10-06
- Anthropic: claude-opus-4-1-20250805
- Google: gemini-2.5-pro
- DeepSeek: deepseek-r1
- xAI: grok-4-fast-reasoning
Configuration Source
Models are defined inconfig/models.yaml under model_catalog:
pricing.models:
Use Cases
1. Discover Available ModelsNotes
- Static Configuration: Models are loaded from
config/models.yaml, not dynamically discovered from provider APIs - Hot Reload: Changes to
models.yamlrequire service restart to take effect - Empty Providers: If a provider returns
[], check that the API key is set in.env - Pricing Centralization: All costs come from
pricingsection in YAML, ensuring consistency across Go/Rust/Python services - Internal Endpoint: The
/providers/modelsendpoint is on the LLM service (port 8000). For external access, use the Gateway’s OpenAI-compatible/v1/modelsendpoint (port 8080) — see OpenAI-Compatible API
Environment Variables
Override model selections with environment variables:Troubleshooting
Empty provider arrays- Verify API key is set:
OPENAI_API_KEY,ANTHROPIC_API_KEY, etc. - Check
config/models.yamlhas entries undermodel_catalog.<provider>
- Ensure
MODELS_CONFIG_PATHpoints to correct file - Verify YAML syntax is valid
- Check for typos in model IDs
- Pricing comes from
pricing.models.<provider>section - Update
config/models.yamland restart services - Verify Go/Rust services also read same config file
Related Documentation
Model Selection Guide
How tier routing and fallback works
Configuration
Environment variables and config files
Submit Task
Use model_tier or model_override
Centralized Pricing
Pricing architecture details