StorageCoordinator routes every read and write to the right one, always scoped
to a namespace.
| Role | Answers | Default backend |
|---|---|---|
| Relational | ”What exactly is stored? What happened, when?” | PostgreSQL |
| Vector | ”What’s semantically similar to this?“ | pgvector |
| Graph | ”Who relates to whom? What’s connected?” | Neo4j |
- PostgreSQL + pgvector + Neo4j: the production default. Each role on a backend built for it.
sqlite_lance: a zero-infrastructure embedded alternative (SQLite + LanceDB, in-process). The simplest way to run Khora locally.
The production stack: PostgreSQL + pgvector + Neo4j
This is what the Quickstart sets up and what every engine is production-ready against.PostgreSQL: the record keeper
Your source of truth. Stores documents, namespaces, and the immutable event log with full ACID guarantees. When you need to know exactly what is stored or what happened when, PostgreSQL answers. Connection pooling is shared across the relational, vector, and event-store roles when they point at the same database, one pool, not three.pool_pre_ping issues a
lightweight SELECT 1 before handing out a connection so stale connections (from DB
restarts or idle timeouts) are replaced transparently.
pgvector: the meaning finder
The pgvector extension turns PostgreSQL into a vector store. Chunk and entity embeddings (1536-dim by default, orhalfvec/float16 to halve storage) live here,
indexed with HNSW for fast approximate nearest-neighbour search:
Neo4j: the connector
Stores entities as nodes and relationships as edges, traversed with Cypher. This is what answers multi-hop questions (“who works with Alice?”, “what’s two hops from this customer?”) that similarity search can’t:Dual entity storage
Entities live in both pgvector and Neo4j: pgvector to find entities similar to a description, Neo4j to traverse from one entity to its neighbours. The coordinator writes both in parallel (asyncio.gather), so the redundancy costs no extra latency.
The simpler path: sqlite_lance
For local development, evaluation, notebooks, and CI, the embedded backend fills all
three roles in-process with zero external services: SQLite for relational/graph,
LanceDB for vectors, all in one local directory.
Production vs. embedded
VectorCypher needs all three roles: PostgreSQL (relational), pgvector (vector), and a graph store. It is production-ready on the PostgreSQL + pgvector + Neo4j stack.sqlite_lance fills all three roles in-process and is the right choice for local
development, evaluation, and CI, but it carries the scale ceiling and gaps noted above.
Treat it as a convenience, not a production deployment.
See Configuration for the install extras behind
each stack.
Coordinator, transactions, and bulk load
- StorageCoordinator: the single entry point. Every method takes the caller’s
namespace_idand filters at the query layer. The per-role attributes (coordinator.graph, etc.) are deprecated in favour of the facade. See Namespaces & isolation for the full contract. - Transactions:
async with coordinator.transaction() as txn:shares one database session across multi-backend writes, withtxn.savepoint()for partial rollback. - Bulk mode:
StorageSettings(bulk_mode=True)defers HNSW index creation and relaxes Neo4j validation for initial loads. Callensure_hnsw_indexes(...)afterward to rebuild. It trades consistency for throughput (for initial loading, not steady-state).
schema
Data model
What’s stored in those backends: documents, chunks, entities, relationships,
events.
tune
Configuration
Every
KHORA_* storage knob and the full install-extras table.