Skip to main content
Khora splits storage into three roles (relational, vector, and graph) because each answers a different kind of question well. You don’t talk to backends directly; a StorageCoordinator routes every read and write to the right one, always scoped to a namespace.
RoleAnswersDefault backend
Relational”What exactly is stored? What happened, when?”PostgreSQL
Vector”What’s semantically similar to this?“pgvector
Graph”Who relates to whom? What’s connected?”Neo4j
Two stacks fill those roles for almost everyone:
  • PostgreSQL + pgvector + Neo4j: the production default. Each role on a backend built for it.
  • sqlite_lance: a zero-infrastructure embedded alternative (SQLite + LanceDB, in-process). The simplest way to run Khora locally.

The production stack: PostgreSQL + pgvector + Neo4j

This is what the Quickstart sets up and what every engine is production-ready against.

PostgreSQL: the record keeper

Your source of truth. Stores documents, namespaces, and the immutable event log with full ACID guarantees. When you need to know exactly what is stored or what happened when, PostgreSQL answers. Connection pooling is shared across the relational, vector, and event-store roles when they point at the same database, one pool, not three. pool_pre_ping issues a lightweight SELECT 1 before handing out a connection so stale connections (from DB restarts or idle timeouts) are replaced transparently.

pgvector: the meaning finder

The pgvector extension turns PostgreSQL into a vector store. Chunk and entity embeddings (1536-dim by default, or halfvec/float16 to halve storage) live here, indexed with HNSW for fast approximate nearest-neighbour search:
SELECT id, content, 1 - (embedding <=> $query) AS similarity
FROM chunks
WHERE namespace_id = $ns
ORDER BY embedding <=> $query
LIMIT 10;
Because vectors are colocated with the relational data, there’s no second datastore to operate for semantic search.

Neo4j: the connector

Stores entities as nodes and relationships as edges, traversed with Cypher. This is what answers multi-hop questions (“who works with Alice?”, “what’s two hops from this customer?”) that similarity search can’t:
MATCH (e:Entity {id: $entity_id})-[r*1..2]-(related:Entity)
WHERE related.namespace_id = $ns
RETURN DISTINCT related, r LIMIT 50

Dual entity storage

Entities live in both pgvector and Neo4j: pgvector to find entities similar to a description, Neo4j to traverse from one entity to its neighbours. The coordinator writes both in parallel (asyncio.gather), so the redundancy costs no extra latency.

The simpler path: sqlite_lance

For local development, evaluation, notebooks, and CI, the embedded backend fills all three roles in-process with zero external services: SQLite for relational/graph, LanceDB for vectors, all in one local directory.
pip install "khora[sqlite-lance]"
It’s the default for the examples, the simplest way to try Khora without standing up Postgres and Neo4j.
sqlite_lance is an embedded convenience, not a production deployment. Known limits vs. the Postgres + Neo4j stack:
  • Scale ceiling ~1M chunks / ~100k entities.
  • Entity vectors aren’t indexed, so search_entities and the inline recall().entities / .relationships lists come back empty. list_entities / get_entity / find_related_entities still work (they read the graph store directly).
  • dream_history returns empty and stats().entities under-reports, even though the underlying operations ran.

Production vs. embedded

VectorCypher needs all three roles: PostgreSQL (relational), pgvector (vector), and a graph store. It is production-ready on the PostgreSQL + pgvector + Neo4j stack. sqlite_lance fills all three roles in-process and is the right choice for local development, evaluation, and CI, but it carries the scale ceiling and gaps noted above. Treat it as a convenience, not a production deployment. See Configuration for the install extras behind each stack.

Coordinator, transactions, and bulk load

  • StorageCoordinator: the single entry point. Every method takes the caller’s namespace_id and filters at the query layer. The per-role attributes (coordinator.graph, etc.) are deprecated in favour of the facade. See Namespaces & isolation for the full contract.
  • Transactions: async with coordinator.transaction() as txn: shares one database session across multi-backend writes, with txn.savepoint() for partial rollback.
  • Bulk mode: StorageSettings(bulk_mode=True) defers HNSW index creation and relaxes Neo4j validation for initial loads. Call ensure_hnsw_indexes(...) afterward to rebuild. It trades consistency for throughput (for initial loading, not steady-state).
schema

Data model

What’s stored in those backends: documents, chunks, entities, relationships, events.
tune

Configuration

Every KHORA_* storage knob and the full install-extras table.