Documentation Index Fetch the complete documentation index at: https://docs.shannon.run/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This guide covers common configuration issues, how to diagnose them, and proven solutions.
Quick Diagnostics
Check Environment Variables
# View all environment variables for a service
docker compose exec orchestrator env | sort
# Check specific variable
docker compose exec orchestrator env | grep MAX_COST_PER_REQUEST
# Check if variable is set
docker compose exec orchestrator printenv MAX_COST_PER_REQUEST
Verify Configuration Files
# Check if config file exists
docker compose exec orchestrator ls -la ./config/
# View config file contents
docker compose exec orchestrator cat ./config/features.yaml
# Check for syntax errors
docker compose exec orchestrator cat ./config/models.yaml | yq .
Check Service Health
# Gateway health
curl http://localhost:8080/health
# Orchestrator metrics
curl http://localhost:2112/metrics
# Agent Core health
grpcurl -plaintext localhost:50051 list
Common Issues
1. Services Won’t Start
Missing Environment Variables
Symptoms :
Service crashes immediately
Logs show “variable not set” errors
Container exits with code 1
Diagnosis :
docker compose logs orchestrator | grep -i "not set\|missing\|required"
Solution :
# Check .env file exists
ls -la .env
# Verify required variables are set
grep -E "OPENAI_API_KEY|POSTGRES" .env
# Copy from example if missing
cp .env.example .env
nano .env # Fill in required values
# Restart services
docker compose restart
Required Variables :
At least one LLM provider key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
Database credentials (POSTGRES_*)
Redis connection (REDIS_*)
Invalid Configuration Syntax
Symptoms :
“Failed to parse config” errors
YAML syntax errors
Service fails to start
Diagnosis :
# Check YAML syntax
docker compose exec orchestrator cat ./config/features.yaml | yq .
Solution :
# Validate YAML locally
yq eval ./config/features.yaml
# Check for common issues
cat ./config/features.yaml | grep -E "^\s+- |^\w+:"
# Reset to defaults
cp ./config/features.yaml.example ./config/features.yaml
2. Authentication Failures
Gateway Returns 401 Unauthorized
Symptoms :
All requests return 401
“Unauthorized” error
API key rejected
Diagnosis :
# Check if auth is enabled
docker compose exec gateway env | grep GATEWAY_SKIP_AUTH
# Test with curl
curl -v http://localhost:8080/api/v1/tasks \
-H "X-API-Key: sk_test_123456" 2>&1 | grep "401"
Solution 1 : Disable auth for development
# Add to .env
GATEWAY_SKIP_AUTH = 1
# Restart gateway
docker compose restart gateway
# Test
curl http://localhost:8080/api/v1/tasks
Solution 2 : Use valid API key
# Insert API key in database
docker compose exec postgres psql -U shannon -d shannon -c "
INSERT INTO auth.api_keys (key, user_id, tenant_id, name, enabled)
VALUES ('sk_test_123456', gen_random_uuid(), gen_random_uuid(), 'Test Key', true);
"
# Test with key
curl -H "X-API-Key: sk_test_123456" \
http://localhost:8080/api/v1/tasks
JWT Secret Not Set
Symptoms :
“JWT secret not configured” error
Authentication middleware fails
Solution :
# Generate secure secret
JWT_SECRET = $( openssl rand -base64 32 )
# Add to .env
echo "JWT_SECRET= $JWT_SECRET " >> .env
# Restart gateway
docker compose restart gateway
3. Database Connection Issues
Cannot Connect to PostgreSQL
Symptoms :
“connection refused” errors
“dial tcp: connect: connection refused”
Services crash on startup
Diagnosis :
# Check if PostgreSQL is running
docker compose ps postgres
# Check PostgreSQL logs
docker compose logs postgres --tail=50
# Test connection
docker compose exec postgres pg_isready -U shannon
Solution 1 : PostgreSQL not started
# Start PostgreSQL
docker compose up -d postgres
# Wait for ready
docker compose exec postgres pg_isready -U shannon
# Restart dependent services
docker compose restart gateway orchestrator
Solution 2 : Wrong credentials
# Verify .env settings
grep POSTGRES .env
# Should match docker-compose.yml
docker compose exec postgres psql -U shannon -d shannon -c "SELECT 1;"
# If password wrong, update .env and restart
docker compose down
docker compose up -d
Solution 3 : Port conflict
# Check if port 5432 is in use
lsof -i :5432
# If conflict, change port in .env
POSTGRES_PORT = 5433
# Update docker-compose.yml
# Restart
docker compose down
docker compose up -d
Database Schema Not Initialized
Symptoms :
“table does not exist” errors
“column not found” errors
SQL errors in logs
Solution :
# Run migrations
docker compose exec orchestrator make migrate
# Or reset database (⚠️ DESTRUCTIVE)
docker compose down -v
docker compose up -d
4. Redis Connection Issues
Cannot Connect to Redis
Symptoms :
“connection refused” to Redis
Session state not persisting
Cache misses
Diagnosis :
# Check Redis status
docker compose ps redis
# Test connection
docker compose exec redis redis-cli ping
# Check logs
docker compose logs redis --tail=20
Solution :
# Start Redis
docker compose up -d redis
# Test connection
docker compose exec redis redis-cli ping
# Should return: PONG
# Restart dependent services
docker compose restart gateway orchestrator llm-service
Redis Authentication Failed
Symptoms :
“NOAUTH Authentication required”
Connection works but commands fail
Solution :
# Check if password is set
docker compose exec redis redis-cli CONFIG GET requirepass
# If password required, add to .env
REDIS_PASSWORD = your-password
# Or disable auth (development only)
docker compose exec redis redis-cli CONFIG SET requirepass ""
# Restart services
docker compose restart
5. LLM Provider Issues
API Key Invalid or Expired
Symptoms :
“Invalid API key” errors
401 from LLM provider
Tasks fail immediately
Diagnosis :
# Check which provider is configured
docker compose exec llm-service env | grep API_KEY
# Test OpenAI key
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY "
# Test Anthropic key
curl https://api.anthropic.com/v1/messages \
-H "X-API-Key: $ANTHROPIC_API_KEY " \
-H "Content-Type: application/json" \
-d '{"model":"claude-3-haiku-20240307","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'
Solution :
# Update key in .env
OPENAI_API_KEY = sk-...new-key...
# Restart LLM service
docker compose restart llm-service
# Verify
docker compose logs llm-service | grep "API key"
Rate Limit Exceeded
Symptoms :
429 errors from LLM provider
“Rate limit exceeded” in logs
Tasks timeout or fail
Solution 1 : Wait for rate limit reset
# Check rate limit headers
docker compose logs llm-service | grep "rate"
# Typical reset: 60 seconds for most providers
Solution 2 : Configure rate limiting
# Add to .env
RATE_LIMIT_REQUESTS = 50 # Lower than provider limit
RATE_LIMIT_WINDOW = 60
# Restart
docker compose restart llm-service
Solution 3 : Use multiple providers
# Configure fallback providers in models.yaml
providers:
- id: openai
primary: true
- id: anthropic
fallback: true
Quota Exceeded
Symptoms :
“insufficient_quota” errors
“You exceeded your current quota”
All LLM calls fail
Solution :
# Check quota
# OpenAI: https://platform.openai.com/account/usage
# Anthropic: https://console.anthropic.com/settings/limits
# Add credits or upgrade plan
# Or use different provider
OPENAI_API_KEY =
ANTHROPIC_API_KEY = sk-ant-...
# Restart
docker compose restart llm-service
6. Model Configuration Issues
Model Not Found
Symptoms :
“model not found” errors
“invalid model” errors
Tasks fail with model errors
Diagnosis :
# Check configured models
docker compose exec llm-service cat ./config/models.yaml | grep "id:"
# Check environment variables
docker compose exec orchestrator env | grep MODEL
Solution :
# Use valid model IDs in .env
DEFAULT_MODEL_TIER = small
COMPLEXITY_MODEL_ID = gpt-5 # Verify this exists
DECOMPOSITION_MODEL_ID = claude-sonnet-4-5-20250929
# Or configure in models.yaml
docker compose exec orchestrator cat ./config/models.yaml
# Restart
docker compose restart orchestrator llm-service
7. Budget and Cost Issues
Tasks Exceed Budget
Symptoms :
“Budget exceeded” errors
Tasks fail with cost errors
MAX_COST_PER_REQUEST exceeded
Solution :
# Increase budget limits
# In .env:
MAX_COST_PER_REQUEST = 1.00 # Increase from 0.50
MAX_TOKENS_PER_REQUEST = 20000 # Increase from 10000
# Restart
docker compose restart orchestrator
# Or use cheaper models
DEFAULT_MODEL_TIER = small # Use GPT-5-nano instead of GPT-5
Budget Enforcement Not Working
Symptoms :
Costs exceed limits
No budget errors
Diagnosis :
# Check budget enforcement
docker compose exec orchestrator env | grep LLM_DISABLE_BUDGETS
Solution :
# Enable budget enforcement
LLM_DISABLE_BUDGETS = 1 # Orchestrator enforces budgets
# Set limits
MAX_COST_PER_REQUEST = 0.50
MAX_TOKENS_PER_REQUEST = 10000
# Restart
docker compose restart orchestrator
Slow Task Execution
Symptoms :
Tasks take 2-3x expected time
High latency
Timeouts
Diagnosis :
# Check resource usage
docker stats
# Check worker concurrency
docker compose exec orchestrator env | grep WORKER
# Check tool parallelism
docker compose exec orchestrator env | grep TOOL_PARALLELISM
Solution 1 : Increase parallelism
# In .env:
TOOL_PARALLELISM = 10 # Increase from 5
WORKER_ACT_CRITICAL = 20 # Increase from 10
# Restart
docker compose restart orchestrator
Solution 2 : Enable caching
# In .env:
ENABLE_CACHE = true
CACHE_SIMILARITY_THRESHOLD = 0.95
# Restart
docker compose restart llm-service
Solution 3 : Optimize model selection
# Use faster models
DEFAULT_MODEL_TIER = small # GPT-5-nano is 10x faster than GPT-5
High Memory Usage
Symptoms :
OOM errors
Container restarts
High swap usage
Diagnosis :
Solution :
# Reduce cache sizes
HISTORY_WINDOW_MESSAGES = 25 # Reduce from 50
STREAMING_RING_CAPACITY = 500 # Reduce from 1000
# Limit tool parallelism
TOOL_PARALLELISM = 3 # Reduce from 5
# Restart
docker compose restart
9. Streaming Issues
SSE Connection Drops
Symptoms :
SSE stream disconnects
Events stop mid-task
“Connection closed” errors
Solution 1 : Increase timeouts
# In nginx/proxy config:
proxy_read_timeout 600s ;
proxy_connect_timeout 600s ;
# In docker-compose.yml for gateway:
GATEWAY_READ_TIMEOUT = 600
Solution 2 : Handle reconnection
# Client-side reconnection
while True :
try :
for event in stream_events(task_id):
process(event)
break # Task completed
except ConnectionError :
time.sleep( 2 ) # Wait and retry
Events Not Received
Symptoms :
No events in stream
Empty SSE response
Stream connects but no data
Diagnosis :
# Check if events are being created
docker compose exec postgres psql -U shannon -d shannon -c "
SELECT COUNT(*) FROM event_logs WHERE workflow_id = 'task_abc123';
"
# Check Redis streams
docker compose exec redis redis-cli XLEN "stream:task_abc123"
Solution :
# Verify admin server is running
docker compose ps orchestrator
# Check admin server endpoint
curl http://localhost:8081/health
# Restart orchestrator
docker compose restart orchestrator
Python Code Execution Fails
Symptoms :
“WASI interpreter not found”
Python code tools fail
Sandbox errors
Solution :
# Download Python WASI interpreter
./scripts/setup_python_wasi.sh
# Or manual download
wget https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.11.4%2B20230908-ba7c2cf/python-3.11.4.wasm
mkdir -p ./wasm-interpreters
mv python-3.11.4.wasm ./wasm-interpreters/
# Verify path in .env
PYTHON_WASI_WASM_PATH = ./wasm-interpreters/python-3.11.4.wasm
# Restart
docker compose restart agent-core llm-service
Symptoms :
“Tool execution timeout” errors
Tools hang indefinitely
WASI timeout errors
Solution :
# Increase timeouts
WASI_TIMEOUT_SECONDS = 120 # Increase from 60
ENFORCE_TIMEOUT_SECONDS = 180 # Increase from 90
# Restart
docker compose restart agent-core
Configuration Validation
Validate All Settings
#!/bin/bash
echo "=== Shannon Configuration Validation ==="
# Check .env file
if [ ! -f .env ]; then
echo "❌ .env file not found"
exit 1
fi
echo "✓ .env file exists"
# Check required variables
required_vars = (
"POSTGRES_HOST"
"REDIS_HOST"
"TEMPORAL_HOST"
)
for var in "${ required_vars [ @ ]}" ; do
if grep -q "^${ var }=" .env ; then
echo "✓ $var is set"
else
echo "❌ $var is missing"
fi
done
# Check at least one LLM provider
if grep -qE "^(OPENAI|ANTHROPIC|GOOGLE)_API_KEY=.+" .env ; then
echo "✓ LLM provider configured"
else
echo "❌ No LLM provider API key set"
fi
# Check services are running
echo ""
echo "=== Service Health ==="
services = ( "postgres" "redis" "temporal" "qdrant" "orchestrator" "agent-core" "llm-service" "gateway" )
for service in "${ services [ @ ]}" ; do
if docker compose ps | grep -q " $service .*running" ; then
echo "✓ $service is running"
else
echo "❌ $service is not running"
fi
done
echo ""
echo "=== Endpoint Tests ==="
# Test Gateway
if curl -f -s http://localhost:8080/health > /dev/null ; then
echo "✓ Gateway health check passed"
else
echo "❌ Gateway health check failed"
fi
# Test Orchestrator metrics
if curl -f -s http://localhost:2112/metrics > /dev/null ; then
echo "✓ Orchestrator metrics available"
else
echo "❌ Orchestrator metrics failed"
fi
echo ""
echo "=== Configuration Validation Complete ==="
Best Practices
1. Use Environment-Specific Configs
# Development
.env.development
ENVIRONMENT = dev
DEBUG = true
GATEWAY_SKIP_AUTH = 1
# Production
.env.production
ENVIRONMENT = prod
DEBUG = false
GATEWAY_SKIP_AUTH = 0
JWT_SECRET =< secure-secret >
2. Document Custom Settings
# In .env, add comments
# Custom rate limit for high-volume API
RATE_LIMIT_REQUESTS = 500 # Increased for enterprise tier
3. Version Control
# .gitignore
.env
.env.local
# Commit templates
.env.example
.env.template
4. Regular Validation
# Add to CI/CD
./scripts/validate-config.sh
5. Monitor Configuration
# Track configuration changes
git diff .env.example
# Alert on critical changes
# Monitor environment variables in production
Quick Fixes Checklist
When things go wrong, try these in order:
Getting Help
If issues persist:
Collect logs :
docker compose logs > shannon-logs.txt
Export configuration :
docker compose exec orchestrator env | grep -v API_KEY > config.txt
Check GitHub issues : https://github.com/Kocoro-lab/Shannon/issues
Environment Variables Complete variable reference
Docker Compose Docker deployment guide
Troubleshooting General troubleshooting
Performance Tuning Performance optimization