Production Deployment Guide
Deploy Shannon to your infrastructure with confidence. This section covers deployment patterns, cloud platform integrations, and operational best practices.Deployment Options
Docker Compose
Production-ready Docker Compose configuration
Status: 🚧 Phase 3
Status: 🚧 Phase 3
Kubernetes
Kubernetes manifests and Helm charts
Status: 🚧 Phase 3
Status: 🚧 Phase 3
AWS
Deploy to Amazon Web Services (ECS, RDS, ElastiCache)
Status: 🚧 Phase 3
Status: 🚧 Phase 3
Azure
Deploy to Microsoft Azure (AKS, PostgreSQL, Redis)
Status: 🚧 Phase 3
Status: 🚧 Phase 3
Operations
Monitoring
Prometheus metrics, Grafana dashboards, and alerting
Status: 🚧 Phase 3
Status: 🚧 Phase 3
Performance Tuning
Optimize throughput, latency, and resource usage
Status: 🚧 Phase 3
Status: 🚧 Phase 3
Security
Production security hardening and best practices
Status: 🚧 Phase 3
Status: 🚧 Phase 3
Quick Start: Local Development
For development and testing, use Docker Compose:Architecture Overview
Shannon consists of multiple services that need to be deployed:Core Services
| Service | Purpose | Scaling |
|---|---|---|
| Gateway | REST API, authentication | Horizontal (stateless) |
| Orchestrator | Task coordination, gRPC | Horizontal (stateful via Temporal) |
| Agent Core | Agent execution, Rust runtime | Horizontal |
| LLM Service | LLM provider gateway | Horizontal |
| Desktop App | Real-time monitoring UI (native client) | Client-side; backend scales independently |
Data Stores
| Store | Purpose | Scaling |
|---|---|---|
| PostgreSQL | Task metadata, events, sessions | Vertical + read replicas |
| Redis | Caching, pub/sub, sessions | Cluster mode |
| Qdrant | Vector embeddings, semantic memory | Horizontal |
| Temporal | Workflow state, durable execution | Cluster mode |
Production Checklist
Before deploying to production:Security
- Enable authentication (
GATEWAY_SKIP_AUTH=0) - Configure TLS/SSL for all services
- Rotate API keys regularly
- Set up OPA policies for access control
- Enable audit logging
- Configure network policies/firewalls
Reliability
- Set up health checks and readiness probes
- Configure auto-scaling policies
- Implement circuit breakers
- Set resource limits (CPU, memory)
- Configure backup and disaster recovery
- Test failover scenarios
Observability
- Deploy Prometheus and Grafana
- Configure alerting rules
- Set up log aggregation (ELK/Loki)
- Enable distributed tracing (OpenTelemetry)
- Create runbooks for common issues
Performance
- Tune Temporal worker concurrency
- Optimize database connections
- Configure Redis caching
- Set appropriate resource limits
- Load test before production launch
Resource Requirements
Minimum (Development)
- CPU: 4 cores
- RAM: 8GB
- Storage: 20GB SSD
Recommended (Production - Small)
- CPU: 16 cores total (distributed across services)
- RAM: 32GB total
- Storage: 100GB SSD
- Network: 1Gbps
Recommended (Production - Large)
- CPU: 64+ cores
- RAM: 128GB+
- Storage: 500GB+ SSD
- Network: 10Gbps
- Load Balancer: Required
- Multi-AZ: Recommended
What’s Next?
Quick Start
Install Shannon locally first
Configuration
Understand environment variables
Architecture
Learn system architecture
Monitoring
Set up monitoring
Get Help
- Discord: Join our community for deployment help
- GitHub: File deployment issues or questions
- Docs: Check Troubleshooting for common problems