Skip to main content

System Architecture

Shannon is built as a distributed microservices system designed for production AI agent orchestration: Shannon System Architecture

Core Components

Gateway (Port 8080)

Technology: Go Purpose: REST API layer for external clients The Gateway provides:
  • HTTP/JSON API interface
  • Authentication and authorization (API keys)
  • Rate limiting per user
  • Idempotency support
  • SSE and WebSocket streaming
  • OpenAPI specification

Key Feature

Authentication is disabled by default for easy adoption. Enable it in production with GATEWAY_SKIP_AUTH=0.

Orchestrator (Port 50052)

Technology: Go + Temporal Purpose: Central workflow coordination The Orchestrator handles:
  • Task routing and decomposition
  • Cognitive pattern selection (CoT, ToT, ReAct)
  • Budget and token usage enforcement
  • Session management
  • OPA policy evaluation
  • Multi-agent coordination
Key Technology: Temporal - Provides durable, deterministic workflows that can be replayed for debugging.

Agent Core (Port 50051)

Technology: Rust Purpose: Secure execution layer The Agent Core provides:
  • WASI (WebAssembly System Interface) sandboxing
  • Secure Python code execution (CPython 3.11 in WASI)
  • Tool registry and execution
  • Result caching (LRU with TTL)
  • Circuit breakers and rate limiting

Security First

All code execution happens in a WASI sandbox with no network access and read-only filesystem.

LLM Service (Port 8000)

Technology: Python + FastAPI Purpose: Multi-provider LLM gateway The LLM Service handles:
  • Multi-provider abstraction (OpenAI, Anthropic, Google, etc.)
  • Intelligent caching with SHA256-based deduplication
  • MCP (Model Context Protocol) tool integration
  • Web search integration (Exa, Perplexity, etc.)
  • Embeddings and document chunking

Data Flow

Here’s how a task flows through Shannon:
1

Task Submission

Client submits task via REST API to Gateway
2

Workflow Creation

Gateway forwards to Orchestrator, which creates a Temporal workflow
3

Pattern Selection

Orchestrator analyzes task complexity and selects cognitive pattern
4

Task Decomposition

For complex tasks, breaks into subtasks and creates DAG (Directed Acyclic Graph)
5

Agent Execution

Orchestrator invokes Agent Core for each subtask
6

LLM Calls

Agent Core calls LLM Service, which routes to appropriate provider
7

Tool Execution

If needed, executes tools in WASI sandbox or calls external APIs
8

Result Synthesis

Orchestrator combines results from all agents
9

Response

Final result returned to client via Gateway

Persistence Layer

PostgreSQL

Stores:
  • Task metadata and execution history
  • Session state and context
  • User and API key data
  • Workflow history
Schema: Relational storage for tasks, sessions, and agent metadata

Redis

Provides:
  • Session caching (TTL: 3600s)
  • LLM response caching
  • Rate limiter state
  • Pub/sub for events

Qdrant

Vector database for:
  • Semantic memory retrieval
  • Session-scoped vector collections
  • MMR (Maximal Marginal Relevance) for diversity

Observability

Shannon includes comprehensive observability:

Metrics (Prometheus)

Each service exposes metrics:
  • Orchestrator: :2112/metrics
  • Agent Core: :2113/metrics
  • LLM Service: :8000/metrics
Metrics include:
  • Request rates and latency
  • Token usage and costs
  • Cache hit/miss rates
  • Error rates by type
  • Circuit breaker status

Tracing (OpenTelemetry)

Distributed tracing across all services with context propagation via traceparent headers.

Desktop Application

Native Tauri/Next.js desktop client providing:
  • Active tasks and workflows (Runs view)
  • Event streams (Run Details)
  • Basic system and task-level insights

Temporal UI (Port 8088)

Native Temporal interface for:
  • Workflow visualization
  • Execution history
  • Replay debugging
  • Worker status

Design Principles

1. Reliability

  • Temporal workflows ensure durability - workflows survive service restarts
  • Circuit breakers prevent cascading failures
  • Graceful degradation when services are unavailable

2. Security

  • WASI sandboxing isolates untrusted code execution
  • OPA policies enforce fine-grained access control
  • Multi-tenancy with tenant isolation

3. Cost Control

  • Token budgets prevent runaway costs
  • Intelligent routing to cheaper models when appropriate
  • Learning router improves cost efficiency over time (85-95% savings)

4. Observability

  • Prometheus metrics for monitoring
  • OpenTelemetry tracing for debugging
  • Deterministic replay via Temporal

Scalability

Shannon scales horizontally:
  • Stateless services: Gateway, Orchestrator, Agent Core can scale independently
  • Temporal workers: Add more workers to increase throughput
  • Database: PostgreSQL with read replicas, Redis cluster, Qdrant distributed mode

Next Steps

Core Concepts

Deep dive into agents and workflows

API Reference

Explore the complete API

Cost Control

Manage and optimize costs

Python SDK

Get started with the SDK