From Plain LLM to Context Graph: The Evolution of AI Knowledge Retrieval
Table of Contents
- The Core Problem: What LLMs Don't Know
- Stage 1 — Plain LLM
- Stage 2 — RAG: Retrieval-Augmented Generation
- Stage 3 — GraphRAG: When Relationships Matter
- Stage 4 — Context Graph: Structured, Dynamic, Agent-Ready
- Side-by-Side Comparison
- Choosing the Right Stage for Your Problem
1. The Core Problem: What LLMs Don't Know
Every architecture in this article is an answer to the same fundamental limitation of Large Language Models:
LLMs have a knowledge cutoff. They know nothing about your private data. And they can hallucinate when asked about specifics they were never trained on.
The progression from plain LLM → RAG → GraphRAG → Context Graph is the story of the AI engineering community finding progressively more powerful ways to solve this problem.
┌────────────────────────────────────────────────────────────┐
│ The LLM Knowledge Problem │
│ │
│ What the LLM knows: │
│ ✓ General world knowledge (up to training cutoff) │
│ ✓ Reasoning, summarization, code generation │
│ ✓ Language patterns and inference │
│ │
│ What the LLM does NOT know: │
│ ✗ Your private documents │
│ ✗ Events after training cutoff │
│ ✗ Real-time data │
│ ✗ Relationships between your specific entities │
│ ✗ Your company's internal rules, policies, cases │
└────────────────────────────────────────────────────────────┘
Each stage in this article is a different answer to: how do we inject the right knowledge into the LLM at query time?
2. Stage 1 — Plain LLM
What it is
You send a prompt directly to a language model. No external knowledge, no retrieval, no databases. The model answers purely from its training weights.
┌──────────────────────────────────────┐
│ PLAIN LLM │
│ │
│ User prompt │
│ │ │
│ ▼ │
│ ┌────────────────────┐ │
│ │ LLM │ │
│ │ (GPT-4o, Claude, │ │
│ │ Gemini, Llama) │ │
│ └────────────────────┘ │
│ │ │
│ ▼ │
│ Response │
└──────────────────────────────────────┘
When it's enough
- General Q&A, summarization, translation
- Code generation and explanation
- Tasks where all needed context can fit in a single prompt
- Creative writing, brainstorming
The hard ceiling
The moment you ask "What did our client sign last Tuesday?" or "Find the cases where we rejected invoices from FastTrack Logistics" — a plain LLM is useless. It has never seen your data.
3. Stage 2 — RAG: Retrieval-Augmented Generation
What it is
RAG solves the private-data problem by searching your documents at query time and injecting the most relevant chunks into the LLM's context window before it answers.
The key innovation: instead of storing all your knowledge in the prompt (which would overflow the context window), you store it in a vector database — a search index where each chunk is represented as a high-dimensional numerical vector (an embedding). At query time you embed the user's question and find the closest matching chunks.
The RAG pipeline
INDEXING (done once, offline)
─────────────────────────────
Documents (PDFs, wikis, emails, ...)
│
▼
Text chunker
(split into ~500-token pieces)
│
▼
Embedding model
(e.g. text-embedding-3-small,
all-MiniLM-L6-v2)
│
▼
Vector database
(each chunk stored as a float vector)
QUERYING (done on every user question)
──────────────────────────────────────
User question
│
▼
Embedding model
(same model as indexing)
│
▼
Vector DB similarity search
(cosine / dot-product distance)
│
▼
Top-K most similar chunks
│
▼
Assemble prompt:
[System] + [Chunk 1] + [Chunk 2] + ... + [User question]
│
▼
LLM
│
▼
Answer (grounded in retrieved chunks)
Similarity search visualised
Vector search finds chunks that are "close" in meaning — not just in exact words:
Vector space (simplified to 2D)
·
"invoice · × Query: "rejected invoice amount"
rejected" × ·
· × ·
· "total ·
· amount" ·
· × "payment approved"
· ·
────────────────────────────────
Far from query Close to query
(not retrieved) (retrieved)
The embedding model maps semantically similar text to nearby points in vector space, regardless of exact wording. "Invoice rejected" and "bill not approved" would land close to each other.
The tools and ecosystem
| Layer | Popular choices | Notes |
|---|---|---|
| Embedding models | text-embedding-3-small (OpenAI), all-MiniLM-L6-v2 (HuggingFace), embed-english-v3.0 (Cohere) | OpenAI embeds are convenient; open-source models keep data local |
| Vector databases | Pinecone, Qdrant, Weaviate, Chroma, pgvector (Postgres extension) | Pinecone = fully managed; Qdrant = self-hosted, fast; pgvector = already in your Postgres |
| Chunking strategies | Fixed-size, sentence-aware, semantic chunking | LlamaIndex and LangChain both have chunking utilities |
| RAG frameworks | LlamaIndex (RAG-first), LangChain (broader orchestration) | LlamaIndex has the richest RAG primitives |
| Rerankers | Cohere Rerank, BGE-Reranker, FlashRank | Run after vector search to re-score Top-K chunks with a cross-encoder |
| Hybrid search | BM25 (keyword) + vector (semantic) | Often better than pure vector; supported in Qdrant, Weaviate, Elasticsearch |
RAG with LlamaIndex — the shape of the code
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
# 1. Load documents
docs = SimpleDirectoryReader("./documents").load_data()
# 2. Build index (chunks → embeddings → Qdrant)
client = qdrant_client.QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")
index = VectorStoreIndex.from_documents(docs, vector_store=vector_store)
# 3. Query
query_engine = index.as_query_engine()
response = query_engine.query("What invoices were rejected last month?")
print(response)
Where RAG struggles
RAG is excellent for "find me documents about X". It breaks down when the answer requires connecting dots across multiple documents:
❌ RAG fails at:
"Who are the top vendors by total billed amount across all invoices,
and which of them had compliance violations in Q4?"
Why it fails:
- No single chunk contains both the billing totals AND the violation data
- The answer requires aggregating across many documents
- The relationships between entities (vendor → invoices → violations)
are implicit in the chunks, not explicit in the index
This is where GraphRAG enters.
4. Stage 3 — GraphRAG: When Relationships Matter
What it is
GraphRAG was introduced by Microsoft Research in 2024. Instead of storing document chunks as isolated vectors, it extracts entities and relationships from your documents and builds a knowledge graph. Retrieval then traverses graph structure rather than just finding nearby vectors.
The two-phase architecture
INDEXING PHASE (expensive, done offline)
─────────────────────────────────────────
Documents
│
▼
LLM extraction pass
│ "Find all entities (people, companies, dates,
│ concepts) and the relationships between them"
│
▼
┌─────────────────────────────────────────┐
│ Knowledge Graph │
│ │
│ [FastTrack Logistics]──BILLED──▶[ACME Corp]
│ │ │
│ ISSUED AUTHORIZED
│ │ │
│ [Invoice INV-0042]◀──MATCHES──[WAF-76948]
│ │
│ AMOUNT
│ │
│ $733.70
│ │
│ [Emma Williams]──WORKED_ON──▶[Site 403]
└─────────────────────────────────────────┘
│
▼
Graph summaries generated per community
(clusters of tightly connected nodes)
QUERYING PHASE
──────────────
User question
│
├──▶ Local search: traverse graph from
│ matched entities → follow edges
│
└──▶ Global search: use pre-built
community summaries for broad questions
│
▼
Relevant subgraph / summaries
│
▼
Assemble context prompt
│
▼
LLM
│
▼
Answer
Local vs global search
GraphRAG introduces a distinction that plain RAG cannot make:
| Search type | Best for | Mechanism |
|---|---|---|
| Local | "Tell me about vendor FastTrack Logistics" | Start at the matched entity, traverse 1–3 hops in the graph |
| Global | "What are the main themes across all our rejected invoices?" | Use pre-built community summaries that cover the whole corpus |
Plain vector RAG cannot do the "global" question well — it only finds locally relevant chunks, not corpus-wide patterns.
GraphRAG tools and ecosystem
| Tool | Role | Notes |
|---|---|---|
Microsoft GraphRAG (graphrag Python package) | Full pipeline: extraction → graph → search | Open source, released 2024; uses OpenAI by default but configurable |
| Neo4j | Graph database storage | Industry standard; excellent Cypher query language |
| Amazon Neptune | Managed graph DB on AWS | Good for AWS-native stacks |
| ArangoDB | Multi-model: graph + document + key-value | Flexible, self-hostable |
| LangChain + Neo4j | RAG over knowledge graphs | Neo4jGraph, GraphCypherQAChain |
| LlamaIndex Knowledge Graph Index | Graph-based RAG | Extracts triplets and stores them; simpler than full GraphRAG |
| spaCy / GLiNER | Entity extraction | Can be used instead of LLM for NER during indexing (cheaper) |
| Ollama | Local LLM for extraction | Run the extraction pass locally to avoid sending private data to a cloud API |
The GraphRAG pipeline with Microsoft's library
# Install
pip install graphrag
# Init a project
graphrag init --root ./my-project
# Configure: edit my-project/settings.yaml
# (set LLM provider, entity extraction prompts, etc.)
# Run the indexing pipeline
graphrag index --root ./my-project
# Query
graphrag query \
--root ./my-project \
--method local \
--query "Which vendors had the most compliance issues?"
The indexing step is expensive — it calls the LLM once per document chunk to extract entities and relationships. For large corpora this can take hours and cost significant API tokens. The result is cached; re-indexing only happens when documents change.
GraphRAG limitations
GraphRAG trade-offs:
✓ Excellent for relationship queries
✓ Global/thematic queries across a large corpus
✓ Handles multi-hop reasoning ("who connected to whom via what?")
✗ Expensive indexing (many LLM calls)
✗ Static — the graph must be re-indexed when documents change
✗ Graph structure is extracted from unstructured text,
so entity resolution is imperfect (same entity, different names)
✗ Complex to set up and tune
✗ The graph is implicit — you cannot directly query it with
your own Cypher/SPARQL unless you export it
The deeper problem: the knowledge graph is built by having an LLM read unstructured text and guess the structure. What if the structure already exists in your data? What if you want precise, real-time, application-defined relationships — not statistically-extracted ones?
That leads to Context Graph.
5. Stage 4 — Context Graph: Structured, Dynamic, Agent-Ready
What it is
A Context Graph is not a single open-source library or a published paper — it is an architectural pattern that has emerged from AI engineering practice. The core idea:
Instead of extracting a graph from unstructured documents, build an explicit, typed, versioned graph of your domain entities and their relationships — and use that graph as the LLM's context source.
Where RAG retrieves chunks of text and GraphRAG retrieves statistically-extracted entity subgraphs, a Context Graph retrieves structured, trusted, application-controlled context.
The structural difference
┌──────────────────────────────────────────────────────────────┐
│ RAG │ GraphRAG │ Context Graph │
│────────────────┼────────────────────┼─────────────────────────│
│ Text chunks │ Entity graph │ Domain object graph │
│ from docs │ extracted by LLM │ managed by application │
│ │ │ │
│ "Invoice INV │ [Invoice]──▶ │ Invoice { │
│ rejected for │ [Vendor]──▶ │ id: "INV-42" │
│ phase3 fail" │ [Violation] │ vendor: Vendor{...} │
│ │ │ decision: REJECT │
│ (text blob) │ (implicit edges) │ phase_failed: "phase3"│
│ │ │ line_items: [...] │
│ │ │ } │
└──────────────────────────────────────────────────────────────┘
How a Context Graph is built
Your application data
(databases, APIs, event streams, processed pipeline outputs)
│
▼
┌──────────────────────────────────────────────────────────┐
│ CONTEXT GRAPH │
│ │
│ Nodes: typed domain entities │
│ • Vendor, Invoice, Case, Rule, ValidationResult, ... │
│ │
│ Edges: explicit, typed relationships │
│ • Invoice ──SUBMITTED_BY──▶ Vendor │
│ • Case ──CONTAINS──────▶ Invoice │
│ • Case ──TRIGGERED──────▶ ALFRule │
│ • ALFRule ──OVERRIDES──────▶ Decision │
│ │
│ Properties on nodes and edges: │
│ • Timestamps, confidence scores, audit trail │
│ │
│ Updated in real time as your application runs │
└──────────────────────────────────────────────────────────┘
│
▼
Context assembly at query time:
1. Identify anchor entities from user query
2. Traverse graph to collect relevant subgraph
3. Serialize subgraph as structured JSON/text
4. Inject into LLM context window
│
▼
LLM answers with full structured context
Context Graph vs GraphRAG
┌───────────────────────────────────────────────────────────┐
│ │
│ GraphRAG Context Graph │
│ ────────── ───────────── │
│ │
│ Graph extracted from Graph built intentionally │
│ unstructured documents from structured app data │
│ │
│ Schema discovered by LLM Schema defined by you │
│ │
│ Static (re-index to update) Dynamic (updated in real time)│
│ │
│ Probabilistic entity Exact entity identity │
│ resolution (your primary keys) │
│ │
│ Built on text corpus Built on your domain model │
│ │
│ Good for: research, Good for: operational AI, │
│ document understanding AI agents, production systems │
└───────────────────────────────────────────────────────────┘
Context Graph in an agentic system
Context Graphs are particularly powerful for AI agents that need to reason about a living, changing domain — not just a static document corpus:
┌─────────────────────────────────────────────────────────────┐
│ AGENTIC SYSTEM WITH CONTEXT GRAPH │
│ │
│ User: "Why was case_116dd8e8 rejected, and are there │
│ similar recent cases from the same vendor?" │
│ │
│ Agent step 1: identify anchor entities │
│ → Case: case_116dd8e8 │
│ → Vendor: FastTrack Logistics │
│ │
│ Agent step 2: traverse Context Graph │
│ case_116dd8e8 │
│ └──SUBMITTED_BY──▶ FastTrack Logistics │
│ └──SUBMITTED──▶ [case_a3f8, case_bb12, ...] │
│ └──DECISION──▶ REJECT (phase3) │
│ │
│ Agent step 3: collect subgraph │
│ {vendor profile, all cases, decisions, rejection phases} │
│ │
│ Agent step 4: inject into LLM prompt │
│ "Here is the structured context: {...}" │
│ "Answer the user's question." │
│ │
│ Agent step 5: LLM answers with full context │
│ "case_116dd8e8 was rejected in phase3 due to a │
│ tax-ID format issue. FastTrack Logistics has 3 │
│ similar rejections in the last 30 days, all for │
│ the same reason. Consider adding an ALF rule." │
└─────────────────────────────────────────────────────────────┘
A plain RAG system could not answer this — the aggregated multi-case pattern across one vendor requires graph traversal, not chunk similarity.
Tools and technologies for Context Graphs
| Category | Tools | Notes |
|---|---|---|
| Graph databases | Neo4j, Memgraph, FalkorDB, Amazon Neptune, TigerGraph | Neo4j is the most mature; FalkorDB is Redis-native and very fast |
| Graph query languages | Cypher (Neo4j), Gremlin, SPARQL, openCypher | Cypher is the most readable for AI context assembly |
| Property graph ORM | neomodel (Python), py2neo, neo4j-ogm | Map domain classes to graph nodes |
| LLM + graph connectors | LangChain Neo4j integration, LlamaIndex KnowledgeGraphIndex, NebulaGraph | Allow LLM to query graph using natural language → Cypher translation |
| Context serialization | JSON-LD, plain JSON, custom text templates | How you turn a subgraph into LLM-readable context |
| Real-time graph updates | Kafka → graph consumer, direct writes on pipeline completion | Keep the graph in sync with application state |
| Vector + graph hybrid | Weaviate (native vector+graph), Neo4j vector index | Combine semantic search with graph traversal in one query |
Text2Cypher: letting the LLM query the graph
One of the most powerful patterns is having the LLM generate graph queries from natural language, then execute them to retrieve precise context:
User: "Show me all vendors with more than 3 rejections this month"
│
▼
LLM generates Cypher:
MATCH (v:Vendor)──[:SUBMITTED]──▶(c:Case)
WHERE c.final_decision = 'REJECT'
AND c.processed_at >= date() - duration({days: 30})
WITH v, count(c) AS rejections
WHERE rejections > 3
RETURN v.name, rejections
ORDER BY rejections DESC
│
▼
Execute against Neo4j
│
▼
Results injected into LLM context
│
▼
LLM answers in natural language
This is the pattern used by tools like LangChain's GraphCypherQAChain and LlamaIndex's Neo4jGraphStore.
6. Side-by-Side Comparison
┌──────────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ │ Plain LLM │ RAG │ GraphRAG │Context Graph │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Knowledge source │ Training │ Text chunks │ Extracted │ App-managed │
│ │ weights only │ (vector DB) │ entity graph │ domain graph │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Private data │ ✗ │ ✓ │ ✓ │ ✓ │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Real-time data │ ✗ │ ✗ (stale) │ ✗ (stale) │ ✓ │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Relationship │ ✗ │ weak │ ✓ │ ✓✓ │
│ reasoning │ │ │ │ │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Multi-hop │ ✗ │ ✗ │ ✓ │ ✓ │
│ queries │ │ │ │ │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Corpus-wide │ ✗ │ ✗ │ ✓ │ ✓ │
│ pattern queries │ │ │ │ │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Schema control │ N/A │ N/A │ LLM-defined │ You define │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Setup complexity │ Low │ Medium │ High │ High │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Indexing cost │ None │ Low │ Very high │ App-write │
│ │ │ │ (LLM calls) │ overhead │
├──────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Best for │ General Q&A │ Doc search │ Research, │ Operational │
│ │ and codegen │ grounded │ document │ AI agents, │
│ │ │ answers │ corpora │ live systems │
└──────────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
The evolution arc
Plain LLM ──▶ RAG ──▶ GraphRAG ──▶ Context Graph
│ │ │ │
General Private Relational Operational
knowledge data reasoning AI agents
only access real-time
7. Choosing the Right Stage for Your Problem
START HERE
│
▼
Does the LLM need access
to private or recent data?
│
No ──────────────────────────▶ Plain LLM
│ (GPT-4o, Claude, etc.)
Yes
│
▼
Is the data primarily
unstructured text documents?
│
Yes
│
▼
Do you need to answer
relationship questions across
many documents?
│
No ──────────────────────────▶ RAG
│ (LlamaIndex + Qdrant,
Yes LangChain + Pinecone)
│
▼
Is the document corpus
relatively static, and is
structure extraction acceptable?
│
Yes ─────────────────────────▶ GraphRAG
│ (Microsoft GraphRAG +
│ Neo4j)
No
│
▼
Is your data already structured
(DB records, pipeline outputs,
API responses) and does it
change in real time?
│
Yes ─────────────────────────▶ Context Graph
(Neo4j / FalkorDB +
custom graph builder +
Text2Cypher or
programmatic traversal)
Quick reference by use case
| Use case | Best approach |
|---|---|
| "Summarise this document for me" | Plain LLM (paste doc in context) |
| "Search our wiki and answer questions" | RAG — LlamaIndex + Qdrant |
| "What are the recurring themes across 10,000 support tickets?" | GraphRAG — Microsoft GraphRAG |
| "Why was this specific order flagged, and are there similar patterns?" | Context Graph — Neo4j + custom graph |
| "Which customers are connected to this fraud case through shared vendors?" | Context Graph — graph traversal |
| "Chat with our PDF documentation" | RAG — LlamaIndex or LangChain |
| "Find all compliance violations related to vendor X across all systems" | Context Graph |
| "Summarise the main topics in our research paper corpus" | GraphRAG (global search) |
A note on combining approaches
These stages are not mutually exclusive. Production systems often combine them:
Context Graph (structured operational data)
+
RAG (unstructured document search)
+
Plain LLM (reasoning and generation)
=
Rich, grounded, relationship-aware answers
For example: an AI assistant for a legal firm might use RAG to search case law documents, a Context Graph to represent client–case–lawyer relationships, and pass both as context to the LLM for generation. The LLM sees: "Here are the relevant legal precedents [from RAG] and here is the client's case history [from Context Graph]. Answer the lawyer's question."
The invoice processing system described in Building an AI Agent for Invoice Processing is an example of a system that deliberately chose not to use RAG or GraphRAG — because the rules book fits in a single LLM context and the structured domain data lives in pipeline JSON artifacts. As the system grows and the case corpus scales to thousands of entries, a Context Graph layer over the case artifacts would be the natural next evolution.