Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.shannon.run/llms.txt

Use this file to discover all available pages before exploring further.

Overview

This guide covers common configuration issues, how to diagnose them, and proven solutions.

Quick Diagnostics

Check Environment Variables

# View all environment variables for a service
docker compose exec orchestrator env | sort

# Check specific variable
docker compose exec orchestrator env | grep MAX_COST_PER_REQUEST

# Check if variable is set
docker compose exec orchestrator printenv MAX_COST_PER_REQUEST

Verify Configuration Files

# Check if config file exists
docker compose exec orchestrator ls -la ./config/

# View config file contents
docker compose exec orchestrator cat ./config/features.yaml

# Check for syntax errors
docker compose exec orchestrator cat ./config/models.yaml | yq .

Check Service Health

# Gateway health
curl http://localhost:8080/health

# Orchestrator metrics
curl http://localhost:2112/metrics

# Agent Core health
grpcurl -plaintext localhost:50051 list

Common Issues

1. Services Won’t Start

Missing Environment Variables

Symptoms:
  • Service crashes immediately
  • Logs show “variable not set” errors
  • Container exits with code 1
Diagnosis:
docker compose logs orchestrator | grep -i "not set\|missing\|required"
Solution:
# Check .env file exists
ls -la .env

# Verify required variables are set
grep -E "OPENAI_API_KEY|POSTGRES" .env

# Copy from example if missing
cp .env.example .env
nano .env  # Fill in required values

# Restart services
docker compose restart
Required Variables:
  • At least one LLM provider key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
  • Database credentials (POSTGRES_*)
  • Redis connection (REDIS_*)

Invalid Configuration Syntax

Symptoms:
  • “Failed to parse config” errors
  • YAML syntax errors
  • Service fails to start
Diagnosis:
# Check YAML syntax
docker compose exec orchestrator cat ./config/features.yaml | yq .
Solution:
# Validate YAML locally
yq eval ./config/features.yaml

# Check for common issues
cat ./config/features.yaml | grep -E "^\s+- |^\w+:"

# Reset to defaults
cp ./config/features.yaml.example ./config/features.yaml

2. Authentication Failures

Gateway Returns 401 Unauthorized

Symptoms:
  • All requests return 401
  • “Unauthorized” error
  • API key rejected
Diagnosis:
# Check if auth is enabled
docker compose exec gateway env | grep GATEWAY_SKIP_AUTH

# Test with curl
curl -v http://localhost:8080/api/v1/tasks \
  -H "X-API-Key: sk_test_123456" 2>&1 | grep "401"
Solution 1: Disable auth for development
# Add to .env
GATEWAY_SKIP_AUTH=1

# Restart gateway
docker compose restart gateway

# Test
curl http://localhost:8080/api/v1/tasks
Solution 2: Use valid API key
# Insert API key in database
docker compose exec postgres psql -U shannon -d shannon -c "
INSERT INTO auth.api_keys (key, user_id, tenant_id, name, enabled)
VALUES ('sk_test_123456', gen_random_uuid(), gen_random_uuid(), 'Test Key', true);
"

# Test with key
curl -H "X-API-Key: sk_test_123456" \
  http://localhost:8080/api/v1/tasks

JWT Secret Not Set

Symptoms:
  • “JWT secret not configured” error
  • Authentication middleware fails
Solution:
# Generate secure secret
JWT_SECRET=$(openssl rand -base64 32)

# Add to .env
echo "JWT_SECRET=$JWT_SECRET" >> .env

# Restart gateway
docker compose restart gateway

3. Database Connection Issues

Cannot Connect to PostgreSQL

Symptoms:
  • “connection refused” errors
  • “dial tcp: connect: connection refused”
  • Services crash on startup
Diagnosis:
# Check if PostgreSQL is running
docker compose ps postgres

# Check PostgreSQL logs
docker compose logs postgres --tail=50

# Test connection
docker compose exec postgres pg_isready -U shannon
Solution 1: PostgreSQL not started
# Start PostgreSQL
docker compose up -d postgres

# Wait for ready
docker compose exec postgres pg_isready -U shannon

# Restart dependent services
docker compose restart gateway orchestrator
Solution 2: Wrong credentials
# Verify .env settings
grep POSTGRES .env

# Should match docker-compose.yml
docker compose exec postgres psql -U shannon -d shannon -c "SELECT 1;"

# If password wrong, update .env and restart
docker compose down
docker compose up -d
Solution 3: Port conflict
# Check if port 5432 is in use
lsof -i :5432

# If conflict, change port in .env
POSTGRES_PORT=5433

# Update docker-compose.yml
# Restart
docker compose down
docker compose up -d

Database Schema Not Initialized

Symptoms:
  • “table does not exist” errors
  • “column not found” errors
  • SQL errors in logs
Solution:
# Run migrations
docker compose exec orchestrator make migrate

# Or reset database (⚠️ DESTRUCTIVE)
docker compose down -v
docker compose up -d

4. Redis Connection Issues

Cannot Connect to Redis

Symptoms:
  • “connection refused” to Redis
  • Session state not persisting
  • Cache misses
Diagnosis:
# Check Redis status
docker compose ps redis

# Test connection
docker compose exec redis redis-cli ping

# Check logs
docker compose logs redis --tail=20
Solution:
# Start Redis
docker compose up -d redis

# Test connection
docker compose exec redis redis-cli ping
# Should return: PONG

# Restart dependent services
docker compose restart gateway orchestrator llm-service

Redis Authentication Failed

Symptoms:
  • “NOAUTH Authentication required”
  • Connection works but commands fail
Solution:
# Check if password is set
docker compose exec redis redis-cli CONFIG GET requirepass

# If password required, add to .env
REDIS_PASSWORD=your-password

# Or disable auth (development only)
docker compose exec redis redis-cli CONFIG SET requirepass ""

# Restart services
docker compose restart

5. LLM Provider Issues

API Key Invalid or Expired

Symptoms:
  • “Invalid API key” errors
  • 401 from LLM provider
  • Tasks fail immediately
Diagnosis:
# Check which provider is configured
docker compose exec llm-service env | grep API_KEY

# Test OpenAI key
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

# Test Anthropic key
curl https://api.anthropic.com/v1/messages \
  -H "X-API-Key: $ANTHROPIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-3-haiku-20240307","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'
Solution:
# Update key in .env
OPENAI_API_KEY=sk-...new-key...

# Restart LLM service
docker compose restart llm-service

# Verify
docker compose logs llm-service | grep "API key"

Rate Limit Exceeded

Symptoms:
  • 429 errors from LLM provider
  • “Rate limit exceeded” in logs
  • Tasks timeout or fail
Solution 1: Wait for rate limit reset
# Check rate limit headers
docker compose logs llm-service | grep "rate"

# Typical reset: 60 seconds for most providers
Solution 2: Configure rate limiting
# Add to .env
RATE_LIMIT_REQUESTS=50  # Lower than provider limit
RATE_LIMIT_WINDOW=60

# Restart
docker compose restart llm-service
Solution 3: Use multiple providers
# Configure fallback providers in models.yaml
providers:
  - id: openai
    primary: true
  - id: anthropic
    fallback: true

Quota Exceeded

Symptoms:
  • “insufficient_quota” errors
  • “You exceeded your current quota”
  • All LLM calls fail
Solution:
# Check quota
# OpenAI: https://platform.openai.com/account/usage
# Anthropic: https://console.anthropic.com/settings/limits

# Add credits or upgrade plan
# Or use different provider
OPENAI_API_KEY=
ANTHROPIC_API_KEY=sk-ant-...

# Restart
docker compose restart llm-service

6. Model Configuration Issues

Model Not Found

Symptoms:
  • “model not found” errors
  • “invalid model” errors
  • Tasks fail with model errors
Diagnosis:
# Check configured models
docker compose exec llm-service cat ./config/models.yaml | grep "id:"

# Check environment variables
docker compose exec orchestrator env | grep MODEL
Solution:
# Use valid model IDs in .env
DEFAULT_MODEL_TIER=small
COMPLEXITY_MODEL_ID=gpt-5  # Verify this exists
DECOMPOSITION_MODEL_ID=claude-sonnet-4-5-20250929

# Or configure in models.yaml
docker compose exec orchestrator cat ./config/models.yaml

# Restart
docker compose restart orchestrator llm-service

7. Budget and Cost Issues

Tasks Exceed Budget

Symptoms:
  • “Budget exceeded” errors
  • Tasks fail with cost errors
  • MAX_COST_PER_REQUEST exceeded
Solution:
# Increase budget limits
# In .env:
MAX_COST_PER_REQUEST=1.00  # Increase from 0.50
MAX_TOKENS_PER_REQUEST=20000  # Increase from 10000

# Restart
docker compose restart orchestrator

# Or use cheaper models
DEFAULT_MODEL_TIER=small  # Use GPT-5-nano instead of GPT-5

Budget Enforcement Not Working

Symptoms:
  • Costs exceed limits
  • No budget errors
Diagnosis:
# Check budget enforcement
docker compose exec orchestrator env | grep LLM_DISABLE_BUDGETS
Solution:
# Enable budget enforcement
LLM_DISABLE_BUDGETS=1  # Orchestrator enforces budgets

# Set limits
MAX_COST_PER_REQUEST=0.50
MAX_TOKENS_PER_REQUEST=10000

# Restart
docker compose restart orchestrator

8. Performance Issues

Slow Task Execution

Symptoms:
  • Tasks take 2-3x expected time
  • High latency
  • Timeouts
Diagnosis:
# Check resource usage
docker stats

# Check worker concurrency
docker compose exec orchestrator env | grep WORKER

# Check tool parallelism
docker compose exec orchestrator env | grep TOOL_PARALLELISM
Solution 1: Increase parallelism
# In .env:
TOOL_PARALLELISM=10  # Increase from 5
WORKER_ACT_CRITICAL=20  # Increase from 10

# Restart
docker compose restart orchestrator
Solution 2: Enable caching
# In .env:
ENABLE_CACHE=true
CACHE_SIMILARITY_THRESHOLD=0.95

# Restart
docker compose restart llm-service
Solution 3: Optimize model selection
# Use faster models
DEFAULT_MODEL_TIER=small  # GPT-5-nano is 10x faster than GPT-5

High Memory Usage

Symptoms:
  • OOM errors
  • Container restarts
  • High swap usage
Diagnosis:
docker stats
Solution:
# Reduce cache sizes
HISTORY_WINDOW_MESSAGES=25  # Reduce from 50
STREAMING_RING_CAPACITY=500  # Reduce from 1000

# Limit tool parallelism
TOOL_PARALLELISM=3  # Reduce from 5

# Restart
docker compose restart

9. Streaming Issues

SSE Connection Drops

Symptoms:
  • SSE stream disconnects
  • Events stop mid-task
  • “Connection closed” errors
Solution 1: Increase timeouts
# In nginx/proxy config:
proxy_read_timeout 600s;
proxy_connect_timeout 600s;

# In docker-compose.yml for gateway:
GATEWAY_READ_TIMEOUT=600
Solution 2: Handle reconnection
# Client-side reconnection
while True:
    try:
        for event in stream_events(task_id):
            process(event)
        break  # Task completed
    except ConnectionError:
        time.sleep(2)  # Wait and retry

Events Not Received

Symptoms:
  • No events in stream
  • Empty SSE response
  • Stream connects but no data
Diagnosis:
# Check if events are being created
docker compose exec postgres psql -U shannon -d shannon -c "
SELECT COUNT(*) FROM event_logs WHERE workflow_id = 'task_abc123';
"

# Check Redis streams
docker compose exec redis redis-cli XLEN "stream:task_abc123"
Solution:
# Verify admin server is running
docker compose ps orchestrator

# Check admin server endpoint
curl http://localhost:8081/health

# Restart orchestrator
docker compose restart orchestrator

10. Tool Execution Issues

Python Code Execution Fails

Symptoms:
  • “WASI interpreter not found”
  • Python code tools fail
  • Sandbox errors
Solution:
# Download Python WASI interpreter
./scripts/setup_python_wasi.sh

# Or manual download
wget https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.11.4%2B20230908-ba7c2cf/python-3.11.4.wasm
mkdir -p ./wasm-interpreters
mv python-3.11.4.wasm ./wasm-interpreters/

# Verify path in .env
PYTHON_WASI_WASM_PATH=./wasm-interpreters/python-3.11.4.wasm

# Restart
docker compose restart agent-core llm-service

Tool Timeout

Symptoms:
  • “Tool execution timeout” errors
  • Tools hang indefinitely
  • WASI timeout errors
Solution:
# Increase timeouts
WASI_TIMEOUT_SECONDS=120  # Increase from 60
ENFORCE_TIMEOUT_SECONDS=180  # Increase from 90

# Restart
docker compose restart agent-core

Configuration Validation

Validate All Settings

#!/bin/bash

echo "=== Shannon Configuration Validation ==="

# Check .env file
if [ ! -f .env ]; then
  echo "❌ .env file not found"
  exit 1
fi
echo "✓ .env file exists"

# Check required variables
required_vars=(
  "POSTGRES_HOST"
  "REDIS_HOST"
  "TEMPORAL_HOST"
)

for var in "${required_vars[@]}"; do
  if grep -q "^${var}=" .env; then
    echo "✓ $var is set"
  else
    echo "❌ $var is missing"
  fi
done

# Check at least one LLM provider
if grep -qE "^(OPENAI|ANTHROPIC|GOOGLE)_API_KEY=.+" .env; then
  echo "✓ LLM provider configured"
else
  echo "❌ No LLM provider API key set"
fi

# Check services are running
echo ""
echo "=== Service Health ==="
services=("postgres" "redis" "temporal" "qdrant" "orchestrator" "agent-core" "llm-service" "gateway")

for service in "${services[@]}"; do
  if docker compose ps | grep -q "$service.*running"; then
    echo "✓ $service is running"
  else
    echo "❌ $service is not running"
  fi
done

echo ""
echo "=== Endpoint Tests ==="

# Test Gateway
if curl -f -s http://localhost:8080/health > /dev/null; then
  echo "✓ Gateway health check passed"
else
  echo "❌ Gateway health check failed"
fi

# Test Orchestrator metrics
if curl -f -s http://localhost:2112/metrics > /dev/null; then
  echo "✓ Orchestrator metrics available"
else
  echo "❌ Orchestrator metrics failed"
fi

echo ""
echo "=== Configuration Validation Complete ==="

Best Practices

1. Use Environment-Specific Configs

# Development
.env.development
ENVIRONMENT=dev
DEBUG=true
GATEWAY_SKIP_AUTH=1

# Production
.env.production
ENVIRONMENT=prod
DEBUG=false
GATEWAY_SKIP_AUTH=0
JWT_SECRET=<secure-secret>

2. Document Custom Settings

# In .env, add comments
# Custom rate limit for high-volume API
RATE_LIMIT_REQUESTS=500  # Increased for enterprise tier

3. Version Control

# .gitignore
.env
.env.local

# Commit templates
.env.example
.env.template

4. Regular Validation

# Add to CI/CD
./scripts/validate-config.sh

5. Monitor Configuration

# Track configuration changes
git diff .env.example

# Alert on critical changes
# Monitor environment variables in production

Quick Fixes Checklist

When things go wrong, try these in order:
  • Restart all services: docker compose restart
  • Check logs: docker compose logs --tail=50
  • Verify .env file exists and has required variables
  • Test database connection: docker compose exec postgres pg_isready
  • Test Redis: docker compose exec redis redis-cli ping
  • Verify at least one LLM API key is set
  • Check disk space: df -h
  • Check memory: docker stats
  • Full reset (last resort): docker compose down -v && docker compose up -d

Getting Help

If issues persist:
  1. Collect logs:
    docker compose logs > shannon-logs.txt
    
  2. Export configuration:
    docker compose exec orchestrator env | grep -v API_KEY > config.txt
    
  3. Check GitHub issues: https://github.com/Kocoro-lab/Shannon/issues

Environment Variables

Complete variable reference

Docker Compose

Docker deployment guide

Troubleshooting

General troubleshooting

Performance Tuning

Performance optimization