Deep Research Agent

This tutorial shows how to use Shannon’s ResearchWorkflow for comprehensive, citation-backed research reports.

What You’ll Learn

Multi-stage workflow: Memory retrieval → Refine → Decompose → Execute → Citations → Gap Filling → Synthesis → Verification
Adaptive patterns: React, Parallel, and Hybrid execution based on complexity
Advanced features: Entity filtering, gap-filling, context compression, react-per-task mode
Quality controls: Citation requirements, coverage enforcement, claim verification
Language matching: Automatically responds in the user’s query language

Prerequisites

Shannon stack running (Docker Compose)
Gateway reachable at http://localhost:8080
Auth defaults:
- Docker Compose: authentication is disabled by default (GATEWAY_SKIP_AUTH=1).
- Local builds: authentication is enabled by default. Set GATEWAY_SKIP_AUTH=1 to disable auth, or include an API key header -H "X-API-Key: $API_KEY".

Workflow Architecture

The ResearchWorkflow combines multiple stages to produce high-quality, cited research:

Key Features:

Memory-aware: Uses conversation history for contextual research
Entity-focused: Filters results when entity is detected (e.g., specific company/product)
Self-correcting: Gap filling ensures comprehensive coverage
Quality-enforced: Per-area coverage validation (≥600 chars, ≥2 citations per section)

Quick Start (HTTP)

# Submit a research task (default “standard” strategy)
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Latest breakthroughs in quantum error correction",
    "context": { "force_research": true }
  }'

# Submit and get a ready-to-use SSE stream URL (recommended)
# Tip: include force_research to route to ResearchWorkflow (otherwise router may select Supervisor)
curl -s -X POST http://localhost:8080/api/v1/tasks/stream \
  -H "Content-Type: application/json" \
  -d '{
        "query": "What are the main transformer architecture trends in 2025?",
        "context": { "force_research": true }
      }' | jq

Stream Events (SSE)

# After the /tasks/stream call, use "stream_url" from the response:
curl -N "http://localhost:8080/api/v1/stream/sse?workflow_id=task-..."

Watch for:

LLM_OUTPUT: final synthesis text
DATA_PROCESSING: progress/usage
WORKFLOW_COMPLETED: completion

Execution Patterns

Shannon selects the optimal execution pattern based on task complexity and dependencies:

React Pattern (complexity < 0.5)

Iterative reason→act→observe loop for simpler research.

# Example: Basic conceptual query triggers React
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is quantum computing?",
    "context": { "force_research": true }
  }'

Behavior: Single agent performs iterative reasoning, web searches, and observation cycles (default 5 iterations).

Parallel Pattern (no dependencies)

Concurrent subtask execution for multi-faceted research.

# Example: Multi-aspect comparison triggers Parallel
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Compare LangChain and AutoGen frameworks: architecture, pricing, community, performance",
    "context": { "force_research": true }
  }'

Behavior: Decomposition creates independent subtasks (e.g., “Analyze LangChain architecture”, “Analyze AutoGen pricing”), executed concurrently with web_search injection for citations.

Hybrid Pattern (has dependencies)

Topological sort + fan-in/fan-out for sequential research with dependencies.

# Example: Query with logical dependencies triggers Hybrid
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Analyze company X market strategy, then evaluate competitive response, then forecast next moves",
    "context": { "force_research": true }
  }'

Behavior: Executes subtasks in dependency order, passing previous results downstream.

React Per Task (deep research mode)

Mini ReAct loops per subtask for high-complexity research. Trigger conditions:

Manual: context.react_per_task = true
Auto-enable: complexity > 0.7 AND strategy in {deep, academic}

# Example: Enable react_per_task for iterative per-area research
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Comprehensive analysis of transformer architecture evolution 2017-2025",
    "context": {
      "force_research": true,
      "react_per_task": true,
      "react_max_iterations": 6
    },
    "research_strategy": "deep"
  }'

Behavior: Each parallel subtask runs its own ReAct loop (reason→act→observe), enabling iterative exploration per research area. More thorough but higher token cost.

Strategy Presets

Presets provide opinionated defaults for different research depths. The Gateway validates and maps these to workflow context.

# Quick strategy (minimal iterations)
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is quantum error correction?",
    "context": { "force_research": true },
    "research_strategy": "quick"
  }'

# Deep strategy with overrides
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Compare LangChain and AutoGen frameworks",
    "context": {
      "force_research": true,
      "react_per_task": true,
      "react_max_iterations": 6
    },
    "research_strategy": "deep",
    "enable_verification": true
  }'

# Academic strategy (auto-enables react_per_task for complexity > 0.7)
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Survey of attention mechanisms in modern NLP architectures",
    "context": { "force_research": true },
    "research_strategy": "academic"
  }'

Quick strategy behavior: uses the lightweight quick_research_agent prompt, always takes the parallel execution path, and skips the iterative coverage loop and gap-filling stages—optimized for fast summaries where depth is not required. Available Presets: quick | standard | deep | academic

Strategy	react_max_iterations	max_concurrent_agents	verification	gap_filling
quick	2	3	✗	✗
standard	3	5	✓	✓ (max_gaps: 3, max_iterations: 2)
deep	4	6	✓	✓ (max_gaps: 2, max_iterations: 2)
academic	5	8	✓	✓ (max_gaps: 3, max_iterations: 2)

Core Parameters:

max_concurrent_agents (1–20): Controls maximum concurrency for parallel subtasks (only applies when multiple subtasks are executed)
react_max_iterations (2–8, default 5): Controls ReAct loop depth per task (independent parameter, no prerequisites)

Advanced Flags:

react_per_task (bool): Enable mini ReAct loops per subtask (auto-enabled for complexity > 0.7 + deep/academic)
enable_verification (bool): Enable claim verification against citations
budget_agent_max (int): Optional per-agent token budget with enforcement

Gap Filling Configuration:

gap_filling_enabled (bool): Enable/disable gap filling
gap_filling_max_gaps (1–20): Maximum number of gaps to detect
gap_filling_max_iterations (1–5): Maximum retry iterations per gap
gap_filling_check_citations (bool): Check citation density as gap indicator

Default preset: standard (if not specified). Presets seed flags only if not already set in context. The Gateway accepts top‑level research_strategy on both /api/v1/tasks and /api/v1/tasks/stream and maps it into context. Preset defaults are loaded from config/research_strategies.yaml. Quick strategy skips the iterative coverage loop and gap-filling stages.

Python SDK

CLI

python -m shannon.cli --base-url http://localhost:8080 \
  submit "Latest quantum computing breakthroughs" \
  --force-research \
  --research-strategy deep --enable-verification

Programmatic

from shannon import ShannonClient

client = ShannonClient(base_url="http://localhost:8080")
handle = client.submit_task(
    "Compare LangChain and AutoGen frameworks",
    context={
        "force_research": True,
        "research_strategy": "deep",
        "react_max_iterations": 6,
        "enable_verification": True,
    },
)
final = client.wait(handle.task_id)
print(final.result)
client.close()

Response Format

Typical status payloads returned by GET /api/v1/tasks/{id} include the synthesized result, metadata (citations, verification), model/provider, and timestamps.

{
  "task_id": "task-00000000-0000-0000-0000-000000000000",
  "status": "TASK_STATUS_COMPLETED",
  "result": "... final synthesis text ...",
  "metadata": {
    "citations": [
      {
        "url": "https://example.com/article",
        "title": "Source Title",
        "source": "example.com",
        "quality_score": 0.92,
        "credibility_score": 0.85
      }
    ],
    "verification": {
      "total_claims": 10,
      "overall_confidence": 0.78,
      "supported_claims": 7,
      "unsupported_claims": [
        "Claim text lacking sufficient evidence ..."
      ],
      "claim_details": [
        {
          "claim": "Concrete claim text ...",
          "confidence": 0.81,
          "supporting_citations": [1, 3],
          "conflicting_citations": []
        }
      ]
    }
  },
  "created_at": "2025-11-07T12:00:00Z",
  "updated_at": "2025-11-07T12:02:00Z",
  "usage": {
    "input_tokens": 1234,
    "output_tokens": 5678,
    "total_tokens": 6912,
    "estimated_cost": 0.0234
  },
  "model_used": "gpt-5-mini-2025-08-07",
  "provider": "openai"
}

Notes:

result is the final synthesized markdown/text.
metadata.citations lists collected sources (all are included in the Sources section of the result).
metadata.verification appears when verification is enabled and completed.
In Gateway status responses, created_at / updated_at reflect response time; authoritative run timing and totals are persisted in the database (events/timeline).
usage summarizes token usage and may include cost. Gateway status may expose estimated_cost; workflow metadata may include cost_usd.
model_used and provider reflect the model/provider chosen during synthesis.

Advanced Features

Memory Retrieval

Hierarchical Memory (priority):

Combines recent (last 5 messages) + semantic (top 5 relevant, similarity ≥ 0.75)
Enables coherent multi-turn research with conversation context
Fallback: Session memory (last 20 messages)

Context Compression:

Auto-triggers for sessions >20 messages
LLM summarizes conversation (target: 37.5% of window)
Prevents context window overflow in long conversations

Entity Filtering

When a specific entity is detected (e.g., company “Acme Analytics”): Query Refinement:

Detects canonical name, exact search queries, official domains, disambiguation terms
Example: canonical_name: "Acme Analytics", official_domains: ["acme.com"]

Citation Filtering:

Scoring: domain match +0.6, alias in URL +0.4, text match +0.4
Threshold: 0.3 (any single match passes)
Safety floor: minKeep=8 (backfilled by quality × credibility)
Official domains always preserved (bypass threshold)

Result Pruning:

Filters off-entity tool results (e.g., removes “Mind Inc” when searching for “Acme Mind”)
Keeps reasoning-only outputs; prunes off-entity tool-driven results

Gap Filling

Auto-detection (max 2 iterations):

Missing section headings (### Area Name)
Gap indicator phrases (“limited information”, “insufficient data”)
Low citation density (< 2 inline citations per section)

Resolution:

Build targeted queries: "Find detailed information about: <area>"
Execute focused ReAct loops (max 3 iterations per gap)
Re-collect citations (global deduplication with original citations)
Re-synthesize with combined evidence using large tier

Example Gap Scenario

Query: "Analyze company X market strategy, competitive landscape, and leadership"

Initial synthesis coverage:
- Market strategy: ✓ 800 chars, 3 citations
- Competitive landscape: ✓ 650 chars, 2 citations
- Leadership: ✗ 200 chars, 0 citations  ← Gap detected

Gap filling triggers:
1. Targeted query: "Find detailed information about: company X leadership"
2. ReAct loop executes focused search
3. Re-synthesize with combined evidence

Citations

Collection:

Extracted from web_search and web_fetch tool outputs
URL/DOI normalization and deduplication
Scoring: quality (recency + completeness) × credibility (domain reputation)
Diversity enforcement (max 3 per domain)

Enforcement:

Minimum 6 inline citations per report (clamped by available, floor 3)
Per-area requirement: ≥2 inline citations per section
Sources section lists all collected citations:
- “Used inline” (cited in text)
- “Additional source” (collected but not cited)

Synthesis Continuation

Trigger: Model stops early with incomplete output Detection (looksComplete validation):

Must end with sentence punctuation (., !, ?, 。)
No dangling conjunctions (and, but, however, therefore...)
Every research area has subsection with ≥600 chars and ≥2 citations

Continuation:

Adaptive margin: min(25% of effective_max_completion, 300 tokens)
Prompt: “Continue from last sentence; maintain headings and citation style”
Uses large tier for quality (gpt-4.1/opus)

Verification

Claim Extraction:

Identifies factual assertions from synthesis
Cross-references against collected citations

Confidence Scoring:

Weighted by citation credibility
Flags conflicts and unsupported claims
Per-claim details: supporting/conflicting citations

Example output:

{
  "verification": {
    "total_claims": 10,
    "overall_confidence": 0.78,
    "supported_claims": 7,
    "unsupported_claims": ["Claim lacking evidence..."],
    "claim_details": [
      {
        "claim": "Transformer architecture introduced in 2017",
        "confidence": 0.95,
        "supporting_citations": [1, 3],
        "conflicting_citations": []
      }
    ]
  }
}

Enable: Set enable_verification: true in context.

Language Matching

Detects user query language (heuristic)
Synthesis responds in the same language
Supports English, Chinese, Japanese, Korean, Arabic, Russian, Spanish, French, German (heuristic‑based)
Generic instructions ensure robustness across languages

Behavior Guarantees

Memory

Hierarchical memory (recent + semantic) injected when available

Compression

Automatic for sessions over 20 messages to prevent context overflow

Coverage

Each research area has dedicated subsection with minimum 600 chars and 2 citations

Gap Filling

Auto-detects and re-searches undercovered areas (max 2 iterations)

Language

Response matches user query language

Cost

Token usage and cost are aggregated and persisted; per-agent budgets enforced when configured

Continuation

Triggers only when synthesis is incomplete and capacity nearly exhausted

Entity Focus

When entity detected, filters citations and prunes off-entity results

Tips & Best Practices

Getting Started
Optimization
Multi-turn Research

Set context.force_research=true to ensure routing to ResearchWorkflow
Start with standard preset, adjust based on results
Monitor SSE for progress and token usage

Use quick preset for simple queries (saves tokens)
Use deep or academic for comprehensive research (auto-enables react_per_task at high complexity)
Set budget_agent_max to enforce per-agent token limits
Enable react_per_task manually for iterative per-area exploration

Troubleshooting

Common Issues:

Language defaults to English: Ensure the query clearly uses the target language
Missing citations: Check if the web_search tool is available; research workflows auto‑inject web_search
Incomplete coverage: Gap filling should auto‑trigger; ensure gap_filling_v1 is enabled
High token usage: Enable budget_agent_max or use quick preset; disable react_per_task

Validation:

Invalid web_search search_type: Sanitized (falls back to auto)
Duplicate citations: Handled via URL/DOI normalization
Entity detection: Performed during refinement when present in the query

Version Gates (internal feature flags):

Memory retrieval: memory_retrieval_v1
Session memory fallback: session_memory_v1
Context compression: context_compress_v1
Gap filling: gap_filling_v1

Next Steps

API Reference

REST API Documentation

Custom Tools

Adding Custom Tools

Vendor Adapters

Vendor-specific Integrations

Related Resources:

GitHub: https://github.com/Kocoro-lab/Shannon
Company: https://kocoro.ai

Overview

Building Agents

Extension & Integration

​Deep Research Agent

​What You’ll Learn

​Prerequisites

​Workflow Architecture

​Quick Start (HTTP)

​Stream Events (SSE)

​Execution Patterns

​React Pattern (complexity < 0.5)

​Parallel Pattern (no dependencies)

​Hybrid Pattern (has dependencies)

​React Per Task (deep research mode)

​Strategy Presets

​Python SDK

​CLI

​Programmatic

​Response Format

​Advanced Features

​Memory Retrieval

​Entity Filtering

​Gap Filling

​Citations

​Synthesis Continuation

​Verification

​Language Matching

​Behavior Guarantees

Memory

Compression

Coverage

Gap Filling

Language

Cost

Continuation

Entity Focus

​Tips & Best Practices

​Troubleshooting

​Next Steps

API Reference

Custom Tools

Vendor Adapters

Deep Research Agent

What You’ll Learn

Prerequisites

Workflow Architecture

Quick Start (HTTP)

Stream Events (SSE)

Execution Patterns

React Pattern (complexity < 0.5)

Parallel Pattern (no dependencies)

Hybrid Pattern (has dependencies)

React Per Task (deep research mode)

Strategy Presets

Python SDK

CLI

Programmatic

Response Format

Advanced Features

Memory Retrieval

Entity Filtering

Gap Filling

Citations

Synthesis Continuation

Verification

Language Matching

Behavior Guarantees

Tips & Best Practices

Troubleshooting

Next Steps