⬡ █

Retrieval Augmented Generation (Rag)

Retrieval-Augmented Generation (RAG)

Overview

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by retrieving relevant information from external data sources before generating a response. As of 2026, the field has transitioned from “Simple RAG” to Agentic RAG and Composable RAG, which involve multi-step, iterative loops for higher accuracy and reasoning.

Modern RAG Architecture (2026)

1. Advanced Ingestion & Indexing

  • Semantic/Contextual Chunking: Using LLMs to identify logical breaks in text rather than fixed-size windows.
  • Knowledge Graphs (GraphRAG): Representing data as entities and relationships to improve reasoning across disparate data points.
  • Multi-modal Indexing: Support for text, images, video, and audio in a single retrieval system.

2. Retrieval Layer

  • Hybrid Search: Combining semantic (vector) search with lexical (exact keyword) search using Reciprocal Rank Fusion (RRF).
  • Query Rewriting/Expansion: Using models to transform a user’s vague query into multiple optimized search terms (e.g., HyDE).

3. Post-Retrieval (Refinement)

  • Reranking: Using “cross-encoder” models to re-order top results for maximum relevance.
  • Context Compression: Pruning irrelevant snippets from retrieved documents to fit more high-signal information into the context window.

4. Generation & Verification

  • Grounded Generation: The LLM generates answers with mandatory citations.
  • Self-Correction/Grading: An “LLM-as-a-judge” step that verifies if the retrieved documents support the generated answer.

Key Frameworks & Repositories

Use CaseRecommended RepoStrength
Agentic Workflows[[LangGraph]] (LangChain)Stateful loops and reasoning.
LLMOps & Visual Dev[[Dify]]“No-code” to “low-code” deployment.
Complex Parsing[[LlamaIndex]]Deep hierarchical indexing.
Enterprise Docs[[RAGFlow]]Extracting messy PDFs/tables.
Real-time Data[[Pathway]]Streaming from Kafka/Slack/Drive.
Pipeline Tuning[[DSPy]]Programmatic prompt optimization.
Graph-based RAG[[LightRAG]]Knowledge graph integration.
Agent Memory[[Mem0]]Cross-session user memory.

Self-Hosted Options

  • [[Open WebUI]]: Best for local/private RAG with models like Ollama.
  • [[Vector Databases]]
  • [[Embedding Models]]
  • [[Model Context Protocol (MCP)]]
  • [[AI Memory Systems]] (Long-term persistence and identity)

Project Ideas

  • Build a “Personal Memory RAG” using local files and [[pgvector]].
  • Implement a “GraphRAG” for complex legal or medical research papers.

Why These Systems Work (Technical Merits)

ComponentProblem SolvedTechnical Solution
[[LangGraph]] / [[Dify]]Brittle, static chains.Stateful Loops: Agents can “cycle back” to search again if retrieval is poor.
[[RAGFlow]] / [[LlamaIndex]]Poor “reading” of messy PDFs/tables.Deep Document Understanding (DDU): Vision-based layout reconstruction.
[[DSPy]]Manual, “vibes-based” prompting.Programmatic Optimization: Automatically re-tunes prompts when models change.
[[Pathway]]Out-of-date static indexes.Streaming RAG: Ingests live Kafka/Slack data in real-time.

Research Validation (2025–2026)

GraphRAG vs. Vector RAG

  • Benchmark: GraphRAG-Bench (June 2025) confirms Vector RAG accuracy is as low as 0–16% for multi-hop queries (e.g., “Summarize cross-departmental overlaps”).
  • Evidence: GraphRAG scores 80–90%+ by following explicit entity relationships.
  • Verdict: Production systems should use a Hybrid Router (Vector for simple lookups, Graph for complex reasoning).

Agentic RAG Efficacy

  • Benchmark: arXiv “Is Agentic RAG worth it?” (2026) shows a 15–20% accuracy gain when using agents that “reflect” and “rewrite” queries compared to static pipelines.

Cost & Feasibility

  • Trend: LazyGraphRAG and KET-RAG (late 2025) have reduced GraphRAG indexing costs by ~90%, removing the primary barrier to enterprise adoption.

Sources

  • [[rag_research_2026]] (Landscape)
  • [[rag_repos_2026]] (Top Repos)
  • [[rag_validation_2026]] (Technical Merits & Benchmarks)