From Plain LLM to Context Graph: The Evolution of AI Knowledge Retrieval

Table of Contents

  1. The Core Problem: What LLMs Don't Know
  2. Stage 1 — Plain LLM
  3. Stage 2 — RAG: Retrieval-Augmented Generation
  4. Stage 3 — GraphRAG: When Relationships Matter
  5. Stage 4 — Context Graph: Structured, Dynamic, Agent-Ready
  6. Side-by-Side Comparison
  7. Choosing the Right Stage for Your Problem

1. The Core Problem: What LLMs Don't Know

Every architecture in this article is an answer to the same fundamental limitation of Large Language Models:

LLMs have a knowledge cutoff. They know nothing about your private data. And they can hallucinate when asked about specifics they were never trained on.

The progression from plain LLM → RAG → GraphRAG → Context Graph is the story of the AI engineering community finding progressively more powerful ways to solve this problem.

┌────────────────────────────────────────────────────────────┐
│         The LLM Knowledge Problem                          │
│                                                            │
│  What the LLM knows:                                       │
│    ✓ General world knowledge (up to training cutoff)       │
│    ✓ Reasoning, summarization, code generation             │
│    ✓ Language patterns and inference                       │
│                                                            │
│  What the LLM does NOT know:                               │
│    ✗ Your private documents                                │
│    ✗ Events after training cutoff                          │
│    ✗ Real-time data                                        │
│    ✗ Relationships between your specific entities          │
│    ✗ Your company's internal rules, policies, cases        │
└────────────────────────────────────────────────────────────┘

Each stage in this article is a different answer to: how do we inject the right knowledge into the LLM at query time?


2. Stage 1 — Plain LLM

What it is

You send a prompt directly to a language model. No external knowledge, no retrieval, no databases. The model answers purely from its training weights.

┌──────────────────────────────────────┐
│           PLAIN LLM                  │
│                                      │
│  User prompt                         │
│      │                               │
│      ▼                               │
│  ┌────────────────────┐              │
│  │    LLM             │              │
│  │  (GPT-4o, Claude,  │              │
│  │   Gemini, Llama)   │              │
│  └────────────────────┘              │
│      │                               │
│      ▼                               │
│  Response                            │
└──────────────────────────────────────┘

When it's enough

  • General Q&A, summarization, translation
  • Code generation and explanation
  • Tasks where all needed context can fit in a single prompt
  • Creative writing, brainstorming

The hard ceiling

The moment you ask "What did our client sign last Tuesday?" or "Find the cases where we rejected invoices from FastTrack Logistics" — a plain LLM is useless. It has never seen your data.


3. Stage 2 — RAG: Retrieval-Augmented Generation

What it is

RAG solves the private-data problem by searching your documents at query time and injecting the most relevant chunks into the LLM's context window before it answers.

The key innovation: instead of storing all your knowledge in the prompt (which would overflow the context window), you store it in a vector database — a search index where each chunk is represented as a high-dimensional numerical vector (an embedding). At query time you embed the user's question and find the closest matching chunks.

The RAG pipeline

INDEXING (done once, offline)
─────────────────────────────
Documents (PDFs, wikis, emails, ...)
        │
        ▼
   Text chunker
   (split into ~500-token pieces)
        │
        ▼
   Embedding model
   (e.g. text-embedding-3-small,
    all-MiniLM-L6-v2)
        │
        ▼
   Vector database
   (each chunk stored as a float vector)


QUERYING (done on every user question)
──────────────────────────────────────
User question
        │
        ▼
   Embedding model
   (same model as indexing)
        │
        ▼
   Vector DB similarity search
   (cosine / dot-product distance)
        │
        ▼
   Top-K most similar chunks
        │
        ▼
   Assemble prompt:
   [System] + [Chunk 1] + [Chunk 2] + ... + [User question]
        │
        ▼
        LLM
        │
        ▼
   Answer (grounded in retrieved chunks)

Similarity search visualised

Vector search finds chunks that are "close" in meaning — not just in exact words:

                  Vector space (simplified to 2D)
                  ·
        "invoice  ·          × Query: "rejected invoice amount"
        rejected" ×          ·
                  ·   ×      ·
                  ·  "total  ·
                  ·  amount" ·
                  ·          ×  "payment approved"
                  ·          ·
                  ────────────────────────────────
                  Far from query          Close to query
                  (not retrieved)         (retrieved)

The embedding model maps semantically similar text to nearby points in vector space, regardless of exact wording. "Invoice rejected" and "bill not approved" would land close to each other.

The tools and ecosystem

LayerPopular choicesNotes
Embedding modelstext-embedding-3-small (OpenAI), all-MiniLM-L6-v2 (HuggingFace), embed-english-v3.0 (Cohere)OpenAI embeds are convenient; open-source models keep data local
Vector databasesPinecone, Qdrant, Weaviate, Chroma, pgvector (Postgres extension)Pinecone = fully managed; Qdrant = self-hosted, fast; pgvector = already in your Postgres
Chunking strategiesFixed-size, sentence-aware, semantic chunkingLlamaIndex and LangChain both have chunking utilities
RAG frameworksLlamaIndex (RAG-first), LangChain (broader orchestration)LlamaIndex has the richest RAG primitives
RerankersCohere Rerank, BGE-Reranker, FlashRankRun after vector search to re-score Top-K chunks with a cross-encoder
Hybrid searchBM25 (keyword) + vector (semantic)Often better than pure vector; supported in Qdrant, Weaviate, Elasticsearch

RAG with LlamaIndex — the shape of the code

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

# 1. Load documents
docs = SimpleDirectoryReader("./documents").load_data()

# 2. Build index (chunks → embeddings → Qdrant)
client = qdrant_client.QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")
index = VectorStoreIndex.from_documents(docs, vector_store=vector_store)

# 3. Query
query_engine = index.as_query_engine()
response = query_engine.query("What invoices were rejected last month?")
print(response)

Where RAG struggles

RAG is excellent for "find me documents about X". It breaks down when the answer requires connecting dots across multiple documents:

❌ RAG fails at:

"Who are the top vendors by total billed amount across all invoices,
 and which of them had compliance violations in Q4?"

Why it fails:
  - No single chunk contains both the billing totals AND the violation data
  - The answer requires aggregating across many documents
  - The relationships between entities (vendor → invoices → violations)
    are implicit in the chunks, not explicit in the index

This is where GraphRAG enters.


4. Stage 3 — GraphRAG: When Relationships Matter

What it is

GraphRAG was introduced by Microsoft Research in 2024. Instead of storing document chunks as isolated vectors, it extracts entities and relationships from your documents and builds a knowledge graph. Retrieval then traverses graph structure rather than just finding nearby vectors.

The two-phase architecture

INDEXING PHASE (expensive, done offline)
─────────────────────────────────────────
Documents
    │
    ▼
LLM extraction pass
    │  "Find all entities (people, companies, dates,
    │   concepts) and the relationships between them"
    │
    ▼
┌─────────────────────────────────────────┐
│           Knowledge Graph               │
│                                         │
│   [FastTrack Logistics]──BILLED──▶[ACME Corp]
│          │                              │
│      ISSUED                         AUTHORIZED
│          │                              │
│   [Invoice INV-0042]◀──MATCHES──[WAF-76948]
│          │
│      AMOUNT
│          │
│        $733.70
│                                         │
│   [Emma Williams]──WORKED_ON──▶[Site 403]
└─────────────────────────────────────────┘
    │
    ▼
Graph summaries generated per community
(clusters of tightly connected nodes)


QUERYING PHASE
──────────────
User question
    │
    ├──▶ Local search: traverse graph from
    │    matched entities → follow edges
    │
    └──▶ Global search: use pre-built
         community summaries for broad questions
    │
    ▼
Relevant subgraph / summaries
    │
    ▼
Assemble context prompt
    │
    ▼
    LLM
    │
    ▼
Answer

GraphRAG introduces a distinction that plain RAG cannot make:

Search typeBest forMechanism
Local"Tell me about vendor FastTrack Logistics"Start at the matched entity, traverse 1–3 hops in the graph
Global"What are the main themes across all our rejected invoices?"Use pre-built community summaries that cover the whole corpus

Plain vector RAG cannot do the "global" question well — it only finds locally relevant chunks, not corpus-wide patterns.

GraphRAG tools and ecosystem

ToolRoleNotes
Microsoft GraphRAG (graphrag Python package)Full pipeline: extraction → graph → searchOpen source, released 2024; uses OpenAI by default but configurable
Neo4jGraph database storageIndustry standard; excellent Cypher query language
Amazon NeptuneManaged graph DB on AWSGood for AWS-native stacks
ArangoDBMulti-model: graph + document + key-valueFlexible, self-hostable
LangChain + Neo4jRAG over knowledge graphsNeo4jGraph, GraphCypherQAChain
LlamaIndex Knowledge Graph IndexGraph-based RAGExtracts triplets and stores them; simpler than full GraphRAG
spaCy / GLiNEREntity extractionCan be used instead of LLM for NER during indexing (cheaper)
OllamaLocal LLM for extractionRun the extraction pass locally to avoid sending private data to a cloud API

The GraphRAG pipeline with Microsoft's library

# Install
pip install graphrag

# Init a project
graphrag init --root ./my-project

# Configure: edit my-project/settings.yaml
# (set LLM provider, entity extraction prompts, etc.)

# Run the indexing pipeline
graphrag index --root ./my-project

# Query
graphrag query \
  --root ./my-project \
  --method local \
  --query "Which vendors had the most compliance issues?"

The indexing step is expensive — it calls the LLM once per document chunk to extract entities and relationships. For large corpora this can take hours and cost significant API tokens. The result is cached; re-indexing only happens when documents change.

GraphRAG limitations

GraphRAG trade-offs:

  ✓ Excellent for relationship queries
  ✓ Global/thematic queries across a large corpus
  ✓ Handles multi-hop reasoning ("who connected to whom via what?")

  ✗ Expensive indexing (many LLM calls)
  ✗ Static — the graph must be re-indexed when documents change
  ✗ Graph structure is extracted from unstructured text,
    so entity resolution is imperfect (same entity, different names)
  ✗ Complex to set up and tune
  ✗ The graph is implicit — you cannot directly query it with
    your own Cypher/SPARQL unless you export it

The deeper problem: the knowledge graph is built by having an LLM read unstructured text and guess the structure. What if the structure already exists in your data? What if you want precise, real-time, application-defined relationships — not statistically-extracted ones?

That leads to Context Graph.


5. Stage 4 — Context Graph: Structured, Dynamic, Agent-Ready

What it is

A Context Graph is not a single open-source library or a published paper — it is an architectural pattern that has emerged from AI engineering practice. The core idea:

Instead of extracting a graph from unstructured documents, build an explicit, typed, versioned graph of your domain entities and their relationships — and use that graph as the LLM's context source.

Where RAG retrieves chunks of text and GraphRAG retrieves statistically-extracted entity subgraphs, a Context Graph retrieves structured, trusted, application-controlled context.

The structural difference

┌──────────────────────────────────────────────────────────────┐
│  RAG           │  GraphRAG          │  Context Graph          │
│────────────────┼────────────────────┼─────────────────────────│
│  Text chunks   │  Entity graph      │  Domain object graph    │
│  from docs     │  extracted by LLM  │  managed by application │
│                │                    │                         │
│  "Invoice INV  │  [Invoice]──▶      │  Invoice {              │
│   rejected for │  [Vendor]──▶       │    id: "INV-42"         │
│   phase3 fail" │  [Violation]       │    vendor: Vendor{...}  │
│                │                    │    decision: REJECT      │
│  (text blob)   │  (implicit edges)  │    phase_failed: "phase3"│
│                │                    │    line_items: [...]     │
│                │                    │  }                       │
└──────────────────────────────────────────────────────────────┘

How a Context Graph is built

Your application data
(databases, APIs, event streams, processed pipeline outputs)
        │
        ▼
┌──────────────────────────────────────────────────────────┐
│              CONTEXT GRAPH                                │
│                                                           │
│  Nodes:   typed domain entities                           │
│    • Vendor, Invoice, Case, Rule, ValidationResult, ...   │
│                                                           │
│  Edges:   explicit, typed relationships                   │
│    • Invoice ──SUBMITTED_BY──▶ Vendor                    │
│    • Case    ──CONTAINS──────▶ Invoice                   │
│    • Case    ──TRIGGERED──────▶ ALFRule                  │
│    • ALFRule ──OVERRIDES──────▶ Decision                 │
│                                                           │
│  Properties on nodes and edges:                          │
│    • Timestamps, confidence scores, audit trail          │
│                                                           │
│  Updated in real time as your application runs           │
└──────────────────────────────────────────────────────────┘
        │
        ▼
Context assembly at query time:
  1. Identify anchor entities from user query
  2. Traverse graph to collect relevant subgraph
  3. Serialize subgraph as structured JSON/text
  4. Inject into LLM context window
        │
        ▼
        LLM answers with full structured context

Context Graph vs GraphRAG

┌───────────────────────────────────────────────────────────┐
│                                                           │
│  GraphRAG                   Context Graph                 │
│  ──────────                 ─────────────                 │
│                                                           │
│  Graph extracted from       Graph built intentionally     │
│  unstructured documents     from structured app data      │
│                                                           │
│  Schema discovered by LLM   Schema defined by you         │
│                                                           │
│  Static (re-index to update) Dynamic (updated in real time)│
│                                                           │
│  Probabilistic entity       Exact entity identity         │
│  resolution                 (your primary keys)           │
│                                                           │
│  Built on text corpus       Built on your domain model    │
│                                                           │
│  Good for: research,        Good for: operational AI,     │
│  document understanding     AI agents, production systems  │
└───────────────────────────────────────────────────────────┘

Context Graph in an agentic system

Context Graphs are particularly powerful for AI agents that need to reason about a living, changing domain — not just a static document corpus:

┌─────────────────────────────────────────────────────────────┐
│              AGENTIC SYSTEM WITH CONTEXT GRAPH              │
│                                                             │
│  User: "Why was case_116dd8e8 rejected, and are there       │
│          similar recent cases from the same vendor?"        │
│                                                             │
│  Agent step 1: identify anchor entities                     │
│    → Case: case_116dd8e8                                    │
│    → Vendor: FastTrack Logistics                            │
│                                                             │
│  Agent step 2: traverse Context Graph                       │
│    case_116dd8e8                                            │
│      └──SUBMITTED_BY──▶ FastTrack Logistics                │
│           └──SUBMITTED──▶ [case_a3f8, case_bb12, ...]      │
│                └──DECISION──▶ REJECT (phase3)              │
│                                                             │
│  Agent step 3: collect subgraph                             │
│    {vendor profile, all cases, decisions, rejection phases} │
│                                                             │
│  Agent step 4: inject into LLM prompt                       │
│    "Here is the structured context: {...}"                  │
│    "Answer the user's question."                            │
│                                                             │
│  Agent step 5: LLM answers with full context               │
│    "case_116dd8e8 was rejected in phase3 due to a          │
│     tax-ID format issue. FastTrack Logistics has 3         │
│     similar rejections in the last 30 days, all for        │
│     the same reason. Consider adding an ALF rule."         │
└─────────────────────────────────────────────────────────────┘

A plain RAG system could not answer this — the aggregated multi-case pattern across one vendor requires graph traversal, not chunk similarity.

Tools and technologies for Context Graphs

CategoryToolsNotes
Graph databasesNeo4j, Memgraph, FalkorDB, Amazon Neptune, TigerGraphNeo4j is the most mature; FalkorDB is Redis-native and very fast
Graph query languagesCypher (Neo4j), Gremlin, SPARQL, openCypherCypher is the most readable for AI context assembly
Property graph ORMneomodel (Python), py2neo, neo4j-ogmMap domain classes to graph nodes
LLM + graph connectorsLangChain Neo4j integration, LlamaIndex KnowledgeGraphIndex, NebulaGraphAllow LLM to query graph using natural language → Cypher translation
Context serializationJSON-LD, plain JSON, custom text templatesHow you turn a subgraph into LLM-readable context
Real-time graph updatesKafka → graph consumer, direct writes on pipeline completionKeep the graph in sync with application state
Vector + graph hybridWeaviate (native vector+graph), Neo4j vector indexCombine semantic search with graph traversal in one query

Text2Cypher: letting the LLM query the graph

One of the most powerful patterns is having the LLM generate graph queries from natural language, then execute them to retrieve precise context:

User: "Show me all vendors with more than 3 rejections this month"

        │
        ▼
   LLM generates Cypher:

   MATCH (v:Vendor)──[:SUBMITTED]──▶(c:Case)
   WHERE c.final_decision = 'REJECT'
     AND c.processed_at >= date() - duration({days: 30})
   WITH v, count(c) AS rejections
   WHERE rejections > 3
   RETURN v.name, rejections
   ORDER BY rejections DESC

        │
        ▼
   Execute against Neo4j

        │
        ▼
   Results injected into LLM context

        │
        ▼
   LLM answers in natural language

This is the pattern used by tools like LangChain's GraphCypherQAChain and LlamaIndex's Neo4jGraphStore.


6. Side-by-Side Comparison

┌──────────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│                  │  Plain LLM   │     RAG      │  GraphRAG    │Context Graph │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Knowledge source │ Training     │ Text chunks  │ Extracted    │ App-managed  │
│                  │ weights only │ (vector DB)  │ entity graph │ domain graph │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Private data     │ ✗            │ ✓            │ ✓            │ ✓            │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Real-time data   │ ✗            │ ✗ (stale)    │ ✗ (stale)    │ ✓            │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Relationship     │ ✗            │ weak         │ ✓            │ ✓✓           │
│ reasoning        │              │              │              │              │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Multi-hop        │ ✗            │ ✗            │ ✓            │ ✓            │
│ queries          │              │              │              │              │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Corpus-wide      │ ✗            │ ✗            │ ✓            │ ✓            │
│ pattern queries  │              │              │              │              │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Schema control   │ N/A          │ N/A          │ LLM-defined  │ You define   │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Setup complexity │ Low          │ Medium       │ High         │ High         │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Indexing cost    │ None         │ Low          │ Very high    │ App-write    │
│                  │              │              │ (LLM calls)  │ overhead     │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Best for         │ General Q&A  │ Doc search   │ Research,    │ Operational  │
│                  │ and codegen  │ grounded     │ document     │ AI agents,   │
│                  │              │ answers      │ corpora      │ live systems │
└──────────────────┴──────────────┴──────────────┴──────────────┴──────────────┘

The evolution arc

  Plain LLM ──▶ RAG ──▶ GraphRAG ──▶ Context Graph
      │            │         │               │
  General       Private   Relational     Operational
  knowledge     data      reasoning       AI agents
  only          access                    real-time

7. Choosing the Right Stage for Your Problem

START HERE
    │
    ▼
Does the LLM need access
to private or recent data?
    │
   No ──────────────────────────▶  Plain LLM
    │                               (GPT-4o, Claude, etc.)
   Yes
    │
    ▼
Is the data primarily
unstructured text documents?
    │
   Yes
    │
    ▼
Do you need to answer
relationship questions across
many documents?
    │
   No ──────────────────────────▶  RAG
    │                               (LlamaIndex + Qdrant,
   Yes                               LangChain + Pinecone)
    │
    ▼
Is the document corpus
relatively static, and is
structure extraction acceptable?
    │
   Yes ─────────────────────────▶  GraphRAG
    │                               (Microsoft GraphRAG +
    │                                Neo4j)
   No
    │
    ▼
Is your data already structured
(DB records, pipeline outputs,
API responses) and does it
change in real time?
    │
   Yes ─────────────────────────▶  Context Graph
                                    (Neo4j / FalkorDB +
                                     custom graph builder +
                                     Text2Cypher or
                                     programmatic traversal)

Quick reference by use case

Use caseBest approach
"Summarise this document for me"Plain LLM (paste doc in context)
"Search our wiki and answer questions"RAG — LlamaIndex + Qdrant
"What are the recurring themes across 10,000 support tickets?"GraphRAG — Microsoft GraphRAG
"Why was this specific order flagged, and are there similar patterns?"Context Graph — Neo4j + custom graph
"Which customers are connected to this fraud case through shared vendors?"Context Graph — graph traversal
"Chat with our PDF documentation"RAG — LlamaIndex or LangChain
"Find all compliance violations related to vendor X across all systems"Context Graph
"Summarise the main topics in our research paper corpus"GraphRAG (global search)

A note on combining approaches

These stages are not mutually exclusive. Production systems often combine them:

Context Graph (structured operational data)
        +
RAG (unstructured document search)
        +
Plain LLM (reasoning and generation)
        =
Rich, grounded, relationship-aware answers

For example: an AI assistant for a legal firm might use RAG to search case law documents, a Context Graph to represent client–case–lawyer relationships, and pass both as context to the LLM for generation. The LLM sees: "Here are the relevant legal precedents [from RAG] and here is the client's case history [from Context Graph]. Answer the lawyer's question."


The invoice processing system described in Building an AI Agent for Invoice Processing is an example of a system that deliberately chose not to use RAG or GraphRAG — because the rules book fits in a single LLM context and the structured domain data lives in pipeline JSON artifacts. As the system grows and the case corpus scales to thousands of entries, a Context Graph layer over the case artifacts would be the natural next evolution.