Skip to main content

Overview

Shannon’s memory system provides intelligent context retention and retrieval across user sessions, enabling agents to maintain conversational continuity and leverage historical interactions for improved responses.

Architecture

Memory Architecture

Storage Layers

PostgreSQL

  • Session Context: Session-level state and metadata
  • Execution Persistence: Agent and tool execution history
  • Task Tracking: High-level task and workflow metadata

Redis

  • Session Cache: Fast access to active session data (TTL: 3600s)
  • Token Budgets: Real-time token usage tracking
  • Compression State: Tracks context compression status

Qdrant (Vector Store)

  • Semantic Memory: High-performance vector similarity search
  • Collection Organization: task_embeddings, summaries, tool_results, document_chunks
  • Hybrid Search: Combines recency and semantic relevance

Memory Types

Hierarchical Memory (Default)

Combines multiple retrieval strategies:
  • Recent Memory: Last N interactions from current session
  • Semantic Memory: Contextually relevant based on query similarity
  • Compressed Summaries: Condensed representations of older conversations

Session Memory

Chronological retrieval of recent interactions within a session.

Agent Memory

Individual agent execution records including:
  • Input queries and generated responses
  • Token usage and model information
  • Tool executions and results

Supervisor Memory

Strategic memory for intelligent task decomposition:
  • Decomposition Patterns: Successful task breakdowns for reuse
  • Strategy Performance: Aggregated metrics per strategy type
  • Failure Patterns: Known failures with mitigation strategies

Configuration

Environment Variables

VariableDefaultDescription
QDRANT_HOSTqdrantQdrant server hostname
QDRANT_PORT6333Qdrant server port
REDIS_TTL_SECONDS3600Session cache TTL

Embedding Requirements

Memory features require OpenAI API access for text embeddings.
  • Default Model: text-embedding-3-small (1536 dimensions)
  • Fallback Behavior: If OpenAI key is not configured, memory operations silently degrade - workflows continue without historical context

Key Features

Intelligent Chunking

  • Splits long answers (>2000 tokens) into manageable chunks
  • 200-token overlap for context preservation
  • Batch embeddings for efficiency

MMR (Maximal Marginal Relevance)

  • Diversity-aware reranking balances relevance with information diversity
  • Default lambda=0.7 optimizes for relevant yet diverse context
  • Fetches 3x requested items, then reranks for diversity

Context Compression

  • Automatic triggers based on message count and token estimates
  • Rate limiting prevents excessive compression
  • Model-aware thresholds for different tiers

Memory Retrieval Flow

1

Query Analysis

Incoming query is analyzed for semantic content
2

Recent Fetch

Retrieves last N messages from current session via Redis
3

Semantic Search

Performs vector similarity search in Qdrant
4

Merge & Deduplicate

Combines results and removes duplicates
5

Context Injection

Injects relevant memory into agent context

Privacy & Data Governance

PII Protection

  • Data minimization: Store only essential fields
  • Anonymization: UUIDs instead of real identities
  • Automatic PII detection and redaction

Data Retention

  • Conversation History: 30-day default retention
  • Decomposition Patterns: 90-day retention
  • User Preferences: Session-based, 24-hour expiry

Performance Optimizations

  • Batch Processing: Single API call for multiple chunks (5x faster)
  • Smart Caching: LRU (2048 entries) + Redis
  • Payload Indexes: 50-90% faster filtering on session_id, tenant_id, user_id
  • Optimized HNSW: m=16, ef_construct=100 for fast similarity search

Limitations

  • Memory retrieval adds latency (mitigated by caching)
  • Vector similarity may miss exact keyword matches
  • Compression is lossy (preserves key points only)
  • Cross-session memory requires explicit session linking

Next Steps