Overview
Shannon’s memory system provides intelligent context retention and retrieval across user sessions, enabling agents to maintain conversational continuity and leverage historical interactions for improved responses.
Architecture
Storage Layers
PostgreSQL
- Session Context: Session-level state and metadata
- Execution Persistence: Agent and tool execution history
- Task Tracking: High-level task and workflow metadata
Redis
- Session Cache: Fast access to active session data (TTL: 3600s)
- Token Budgets: Real-time token usage tracking
- Compression State: Tracks context compression status
Qdrant (Vector Store)
- Semantic Memory: High-performance vector similarity search
- Collection Organization: task_embeddings, summaries, tool_results, document_chunks
- Hybrid Search: Combines recency and semantic relevance
Memory Types
Hierarchical Memory (Default)
Combines multiple retrieval strategies:
- Recent Memory: Last N interactions from current session
- Semantic Memory: Contextually relevant based on query similarity
- Compressed Summaries: Condensed representations of older conversations
Session Memory
Chronological retrieval of recent interactions within a session.
Agent Memory
Individual agent execution records including:
- Input queries and generated responses
- Token usage and model information
- Tool executions and results
Supervisor Memory
Strategic memory for intelligent task decomposition:
- Decomposition Patterns: Successful task breakdowns for reuse
- Strategy Performance: Aggregated metrics per strategy type
- Failure Patterns: Known failures with mitigation strategies
Configuration
Environment Variables
| Variable | Default | Description |
|---|
QDRANT_HOST | qdrant | Qdrant server hostname |
QDRANT_PORT | 6333 | Qdrant server port |
REDIS_TTL_SECONDS | 3600 | Session cache TTL |
Embedding Requirements
Memory features require OpenAI API access for text embeddings.
- Default Model:
text-embedding-3-small (1536 dimensions)
- Fallback Behavior: If OpenAI key is not configured, memory operations silently degrade - workflows continue without historical context
Key Features
Intelligent Chunking
- Splits long answers (>2000 tokens) into manageable chunks
- 200-token overlap for context preservation
- Batch embeddings for efficiency
MMR (Maximal Marginal Relevance)
- Diversity-aware reranking balances relevance with information diversity
- Default lambda=0.7 optimizes for relevant yet diverse context
- Fetches 3x requested items, then reranks for diversity
Context Compression
- Automatic triggers based on message count and token estimates
- Rate limiting prevents excessive compression
- Model-aware thresholds for different tiers
Memory Retrieval Flow
Query Analysis
Incoming query is analyzed for semantic content
Recent Fetch
Retrieves last N messages from current session via Redis
Semantic Search
Performs vector similarity search in Qdrant
Merge & Deduplicate
Combines results and removes duplicates
Context Injection
Injects relevant memory into agent context
Privacy & Data Governance
PII Protection
- Data minimization: Store only essential fields
- Anonymization: UUIDs instead of real identities
- Automatic PII detection and redaction
Data Retention
- Conversation History: 30-day default retention
- Decomposition Patterns: 90-day retention
- User Preferences: Session-based, 24-hour expiry
- Batch Processing: Single API call for multiple chunks (5x faster)
- Smart Caching: LRU (2048 entries) + Redis
- Payload Indexes: 50-90% faster filtering on session_id, tenant_id, user_id
- Optimized HNSW: m=16, ef_construct=100 for fast similarity search
Limitations
- Memory retrieval adds latency (mitigated by caching)
- Vector similarity may miss exact keyword matches
- Compression is lossy (preserves key points only)
- Cross-session memory requires explicit session linking
Next Steps