What it’s good at
VectorCypher shines when:- Multi-hop queries matter: “Who works on deals with companies that Alex mentioned?”
- Graph traversal is essential: Navigate organizational hierarchies, deal chains, team structures
- Relationship discovery is key: Find implicit connections across data sources
- Mixed query complexity: Automatic routing adapts retrieval per query, so simple lookups stay fast while complex questions get full graph expansion
sqlite_lance graph store for local development. See Storage backends.
For comprehensive extraction over 100% of chunks, pass engine_kwargs={"skeleton_core_ratio": 1.0}. VectorCypher’s KET-RAG selectivity defaults to the top 70% of chunks but accepts a 1.0 override that extracts from every chunk.
Architecture Overview
Core Design Principles
-
Dual-Node Architecture: Inspired by HippoRAG 2, maintains both Chunk and Entity nodes in Neo4j, linked via
MENTIONED_INrelationships - Query Routing: Intelligent classification routes queries to optimal search paths (vector-only for simple queries, full VectorCypher for complex)
- Skeleton-Based Extraction: Only core chunks (identified via PageRank, default 70%) get full LLM entity extraction, balancing cost and quality
- RRF Fusion: Reciprocal Rank Fusion combines vector and graph results with configurable weights
-
Bi-Temporal Support: Tracks
occurred_at(when an event happened) vsingested_at(when Khora learned about it)
Key Components
VectorCypherEngine (src/khora/engines/vectorcypher/engine.py)
The main engine class implementing MemoryEngineProtocol:
| Method | Description |
|---|---|
remember() | Store content with deduplication and skeleton-based extraction |
recall() | Retrieve memories with VectorCypher hybrid search |
forget() | Remove a memory (cleans both pgvector and Neo4j) |
remember_batch() | Batch ingestion with parallel processing |
find_related_entities() | Graph traversal to find related entities |
stats() | Get document/chunk/entity counts |
RecallResult Context
recall() returns RecallResult objects whose typed projections expose:
chunks: Matching text passages asRecallChunkentries (chunk.content)entities: Entities mentioned in matching chunksrelationships: Connections between entities in the result setdocuments: FullDocumentProjectionrows for every document referenced by a chunk, entity, or relationship (always populated; see Source Document Population)
khora.context_text(result, max_chunks=...) helper:
Source Document Population
recall() always returns a RecallResult whose documents list holds full DocumentProjection rows for every document referenced by a chunk, entity, or relationship in the result. This is a producer-enforced invariant. Khora batch-fetches DocumentSource metadata after the engine returns (chunked at 1,000 IDs) and replaces the engine’s lightweight stubs in place. The engine itself uses the namespace-scoped coordinator facade for that lookup, so cross-namespace ids never leak through.
get_entity(), list_entities(), find_related_entities(), search_entities()) accept include_sources: bool = False to opt-in to per-entity source_documents population. All four require namespace_id= (kwarg-only) on every call. The IDOR close-out enforces this at the Protocol level on every storage backend.
VectorCypherRetriever (src/khora/engines/vectorcypher/retriever.py)
Implements the hybrid retrieval pipeline:
- Route Query: Classify as SIMPLE, MODERATE, or COMPLEX
- Embed Query: Generate query embedding via LiteLLM
- Vector Search: Find entry entities via pgvector similarity (with
hnsw.ef_search = 200) - Cypher Expand: Traverse graph to find related entities (if complex)
- Fetch Chunks: Get chunks via
MENTIONED_INrelationships, with optional temporal sort - RRF Fusion: Combine vector and graph results
- Recency Boost: Apply recency decay so newer chunks rank higher (configurable, tune down for evergreen corpora)
QueryComplexityRouter (src/khora/engines/vectorcypher/router.py)
Routes queries to optimal search paths:
| Pattern | Complexity | Examples |
|---|---|---|
| Simple questions | SIMPLE | ”What is X?”, “Who is Y?” |
| Relationship keywords | MODERATE+ | “related to”, “connected with” |
| Comparison keywords | COMPLEX | ”compare”, “difference between” |
| Multi-hop keywords | COMPLEX | ”through”, “chain”, “path” |
| Multiple entities | COMPLEX | Queries mentioning 2+ entities |
DualNodeManager (src/khora/engines/vectorcypher/dual_nodes.py)
Manages HippoRAG 2 dual-node structure in Neo4j:
temporal_sort parameter controls Cypher ordering:
temporal_sort | Cypher ORDER BY | When Used |
|---|---|---|
False (default) | total_mentions DESC | Non-temporal queries, rank by relevance |
True | c.occurred_at DESC, total_mentions DESC | Temporal queries, most recent chunks first, tiebreak by relevance |
Chunk.occurred_at, so the temporal sort adds negligible overhead.
RRF Fusion (src/khora/engines/vectorcypher/fusion.py)
Combines vector and graph results using Reciprocal Rank Fusion:
Query Routing
VectorCypher uses intelligent query routing to balance performance and quality:SIMPLE Queries (Vector-Only)
Characteristics:- Simple factual questions
- Single entity mentions
- Direct lookups
MODERATE Queries (Shallow Graph)
Characteristics:- Single relationship exploration
- Moderate entity complexity
- One-hop connections
COMPLEX Queries (Full VectorCypher)
Characteristics:- Multi-hop relationships
- Comparisons across entities
- Aggregations over graph structure
Recency scoring
After RRF fusion, VectorCypher applies a configurable recency boost so newer chunks rank higher._calculate_recency_scores() uses max(occurred_at) from the result set as the reference point instead of datetime.now(UTC), so historical or benchmark data still produces meaningful recency discrimination regardless of when the query runs. A result “2 days before the newest result” always gets the same score, whether the data is from 2024 or 2026. Tune it with temporal_recency_weight and temporal_recency_decay_days (see Tuning), or set the weight to 0 for evergreen corpora.
Configuration
VectorCypherConfig
Via engine_kwargs (Khora Constructor)
The recommended way to pass VectorCypherConfig is through the engine_kwargs parameter on Khora:
engine_kwargs dict is forwarded directly to the VectorCypherEngine constructor, which accepts vectorcypher_config as a keyword argument.
Via Environment Variables
engine="vectorcypher" to
Khora(...) (it is also the default).
Via YAML
Requirements
Required:- PostgreSQL with pgvector extension
- Neo4j (required, not optional)
- Neo4j GDS library (for efficient entity vector search)
- Neo4j 5.x+ for best performance
Performance Characteristics
| Metric | SIMPLE | MODERATE | COMPLEX |
|---|---|---|---|
| P95 Latency | <200ms | <400ms | <800ms |
| Graph Depth | 0 | 1 | 2-3 |
| Entry Entities | 5 | 10 | 15 |
| Use Graph | No | Yes | Yes |
Tuning Guide
core_ratio
Controls what percentage of chunks get full knowledge graph extraction:| Value | LLM Calls | Graph Density | Use When |
|---|---|---|---|
| 0.90 | Most | Very dense graph | Maximum recall, cost not a concern |
| 0.70 | Default | Dense graph | Most cases (good quality/cost balance) |
| 0.50 | Moderate | Moderate graph | Cost-conscious with decent coverage |
| 0.25 | Fewer | Sparse graph | Cost-sensitive, simple queries |
graph_depth
Controls Cypher traversal depth for complex queries:| Depth | Hops | Latency | Use When |
|---|---|---|---|
| 1 | Direct connections | Fast | Simple lookups |
| 2 | Friends-of-friends | Default | Most queries |
| 3 | 3-hop paths | Slower | Complex relationships |
| 4 | Maximum | Slowest | Deep exploration |
Fusion Weights
Controls blending of vector and graph results:| vector_weight | graph_weight | Behavior |
|---|---|---|
| 0.8 | 0.2 | Mostly semantic similarity |
| 0.6 | 0.4 | Balanced (default) |
| 0.4 | 0.6 | Graph-heavy (relationship queries) |
| 0.2 | 0.8 | Mostly graph traversal |
Adaptive Depth
Whenadaptive_depth_enabled=True (the default), the retriever dynamically adjusts graph traversal depth based on how many entry entities the vector search returns:
| Entry Entities | Depth Adjustment | Reason |
|---|---|---|
| ≥ 10 (high threshold) | Reduce to depth 1 | Many entities → deep traversal explodes candidates without adding signal |
| 3–9 | Use configured depth | Normal range, default behavior |
| ≤ 2 (low threshold) | Increase depth by 1 | Few entities → deeper traversal compensates for sparse entry points |
Score Normalization
The fusion functionweighted_rrf_normalized normalizes vector and graph scores to [0, 1] via min-max normalization before computing Reciprocal Rank Fusion. This matters when the two sources produce scores on very different scales (for example, cosine similarity scores in [0.3, 0.9] vs graph proximity scores in [0.01, 0.5]). Without normalization, the source with larger absolute scores dominates the fusion.
Both the SIMPLE and COMPLEX retrieval paths normalize final scores to [0,1] using min-max normalization.
Search Index Improvements
Three PostgreSQL indexes improve query-time performance:| Index | Type | Target | Purpose |
|---|---|---|---|
ix_khora_chunks_tags_gin | GIN | khora_chunks.tags | Fast array-containment queries (tags @> ARRAY['topic']) |
ix_khora_chunks_ns_occurred | B-tree (composite) | (namespace_id, occurred_at) | Temporal filtering within a namespace |
ix_khora_chunks_embedding_hnsw | HNSW | khora_chunks.embedding | Vector similarity with ef_construction=128 (up from 64) |
ef_construction improves recall at index-build time. More candidates are considered during graph construction, producing a higher-quality approximate nearest neighbor index. Query-time ef_search can be tuned separately via PostgreSQL’s SET hnsw.ef_search = N.
Run the migration with:
Recent Improvements
Cross-encoder reranking. After the initial vector + Cypher retrieval, an optional cross-encoder model rescores the top candidates for precision. The model is cached across queries to avoid reload overhead, and inference runs inasyncio.to_thread to keep the event loop free. Enable/disable via KHORA_QUERY_ENABLE_RERANKING.
Independent BM25 channel. VectorCypher now runs BM25 full-text search as a separate retrieval channel alongside vector and Cypher graph traversal. Results are fused via RRF, giving keyword-exact matches a dedicated signal path rather than relying solely on embedding similarity.
VectorCypher is the default engine when creating a Khora without an explicit engine= argument.
Related Documentation
- Tuning: Every VectorCypher knob, when to adjust it, and the tradeoff
- Retrieval pipeline: How
recall()flows through the engine - Storage backends: The PostgreSQL + pgvector + Neo4j stack and the embedded
sqlite_lancealternative - Hybrid Search: Vector + BM25 fusion details