Overview
Shannon’s memory system provides intelligent context retention and retrieval across user sessions, enabling agents to maintain conversational continuity and leverage historical interactions for improved responses.Architecture
Storage Layers
PostgreSQL
- Session Context: Session-level state and metadata
- Execution Persistence: Agent and tool execution history
- Task Tracking: High-level task and workflow metadata
Redis
- Session Cache: Fast access to active session data (TTL: 3600s)
- Token Budgets: Real-time token usage tracking
- Compression State: Tracks context compression status
Qdrant (Vector Store)
- Semantic Memory: High-performance vector similarity search
- Collection Organization: task_embeddings, summaries, tool_results, document_chunks
- Hybrid Search: Combines recency and semantic relevance
Memory Types
Hierarchical Memory (Default)
Combines multiple retrieval strategies:- Recent Memory: Last N interactions from current session
- Semantic Memory: Contextually relevant based on query similarity
- Compressed Summaries: Condensed representations of older conversations
Session Memory
Chronological retrieval of recent interactions within a session.Agent Memory
Individual agent execution records including:- Input queries and generated responses
- Token usage and model information
- Tool executions and results
Supervisor Memory
Strategic memory for intelligent task decomposition:- Decomposition Patterns: Successful task breakdowns for reuse
- Strategy Performance: Aggregated metrics per strategy type
- Failure Patterns: Known failures with mitigation strategies
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST | qdrant | Qdrant server hostname |
QDRANT_PORT | 6333 | Qdrant server port |
REDIS_TTL_SECONDS | 3600 | Session cache TTL |
Embedding Requirements
- Default Model:
text-embedding-3-small(1536 dimensions) - Fallback Behavior: If OpenAI key is not configured, memory operations silently degrade - workflows continue without historical context
Key Features
Intelligent Chunking
- Splits long answers (>2000 tokens) into manageable chunks
- 200-token overlap for context preservation
- Batch embeddings for efficiency
MMR (Maximal Marginal Relevance)
- Diversity-aware reranking balances relevance with information diversity
- Default lambda=0.7 optimizes for relevant yet diverse context
- Fetches 3x requested items, then reranks for diversity
Context Compression
- Automatic triggers based on message count and token estimates
- Rate limiting prevents excessive compression
- Model-aware thresholds for different tiers
Memory Retrieval Flow
Privacy & Data Governance
PII Protection
- Data minimization: Store only essential fields
- Anonymization: UUIDs instead of real identities
- Automatic PII detection and redaction
Data Retention
- Conversation History: 30-day default retention
- Decomposition Patterns: 90-day retention
- User Preferences: Session-based, 24-hour expiry
Performance Optimizations
- Batch Processing: Single API call for multiple chunks (5x faster)
- Smart Caching: LRU (2048 entries) + Redis
- Payload Indexes: 50-90% faster filtering on session_id, tenant_id, user_id
- Optimized HNSW: m=16, ef_construct=100 for fast similarity search
Limitations
- Memory retrieval adds latency (mitigated by caching)
- Vector similarity may miss exact keyword matches
- Compression is lossy (preserves key points only)
- Cross-session memory requires explicit session linking
Enabling Semantic Memory
Follow the steps below to enable Shannon’s semantic memory system backed by Qdrant.Prerequisites
Before proceeding, ensure the following are in place:
- Qdrant is running (included by default in Shannon’s
docker-compose.yaml) OPENAI_API_KEYis set in your environment (required for thetext-embedding-3-smallembedding model)
Step-by-Step Setup
Enable vector memory in shannon.yaml
Add or update the
vector block in your shannon.yaml configuration:shannon.yaml
Verify Qdrant collections
Shannon automatically creates 5 collections (all using 1536-dimensional vectors from
You can verify the collections are created by querying the Qdrant REST API:
text-embedding-3-small):| Collection | Purpose |
|---|---|
task_embeddings | Task result embeddings for semantic search |
tool_results | Tool execution result embeddings |
cases | Case library for pattern matching |
document_chunks | Document chunk embeddings for RAG |
summaries | Summary embeddings |
Configure MMR diversity reranking
MMR (Maximal Marginal Relevance) balances relevance and diversity in retrieval results. When enabled, Shannon fetches a larger candidate pool and reranks to reduce redundancy while preserving relevance.
shannon.yaml
A
mmr_lambda of 0.7 is a good default — it strongly favours relevance while still filtering out near-duplicate results.Next Steps
Architecture Overview
System architecture
Sessions API
Session management