Storage Backends

Khora splits storage into three roles (relational, vector, and graph) because each answers a different kind of question well. You don’t talk to backends directly; a StorageCoordinator routes every read and write to the right one, always scoped to a namespace.

Role	Answers	Default backend
Relational	”What exactly is stored? What happened, when?”	PostgreSQL
Vector	”What’s semantically similar to this?“	pgvector
Graph	”Who relates to whom? What’s connected?”	Neo4j

Two stacks fill those roles for almost everyone:

PostgreSQL + pgvector + Neo4j: the production default. Each role on a backend built for it.
sqlite_lance: a zero-infrastructure embedded alternative (SQLite + LanceDB, in-process). The simplest way to run Khora locally.

The production stack: PostgreSQL + pgvector + Neo4j

This is what the Quickstart sets up and what every engine is production-ready against.

PostgreSQL: the record keeper

Your source of truth. Stores documents, namespaces, and the immutable event log with full ACID guarantees. When you need to know exactly what is stored or what happened when, PostgreSQL answers. Connection pooling is shared across the relational, vector, and event-store roles when they point at the same database, one pool, not three. pool_pre_ping issues a lightweight SELECT 1 before handing out a connection so stale connections (from DB restarts or idle timeouts) are replaced transparently.

pgvector: the meaning finder

The pgvector extension turns PostgreSQL into a vector store. Chunk and entity embeddings (1536-dim by default, or halfvec/float16 to halve storage) live here, indexed with HNSW for fast approximate nearest-neighbour search:

SELECT id, content, 1 - (embedding <=> $query) AS similarity
FROM chunks
WHERE namespace_id = $ns
ORDER BY embedding <=> $query
LIMIT 10;

Because vectors are colocated with the relational data, there’s no second datastore to operate for semantic search.

Neo4j: the connector

Stores entities as nodes and relationships as edges, traversed with Cypher. This is what answers multi-hop questions (“who works with Alice?”, “what’s two hops from this customer?”) that similarity search can’t:

MATCH (e:Entity {id: $entity_id})-[r*1..2]-(related:Entity)
WHERE related.namespace_id = $ns
RETURN DISTINCT related, r LIMIT 50

Dual entity storage

Entities live in both pgvector and Neo4j: pgvector to find entities similar to a description, Neo4j to traverse from one entity to its neighbours. The coordinator writes both in parallel (asyncio.gather), so the redundancy costs no extra latency.

The simpler path: `sqlite_lance`

For local development, evaluation, notebooks, and CI, the embedded backend fills all three roles in-process with zero external services: SQLite for relational/graph, LanceDB for vectors, all in one local directory.

pip install "khora[sqlite-lance]"

It’s the default for the examples, the simplest way to try Khora without standing up Postgres and Neo4j.

sqlite_lance is an embedded convenience, not a production deployment. Known limits vs. the Postgres + Neo4j stack:

Scale ceiling ~1M chunks / ~100k entities.
Entity vector search is brute-force (no ANN index), so entity-heavy recall is slower than on pgvector. Results are still correct, and recall().entities / .relationships, search_entities, list_entities, get_entity, and find_related_entities all work.
dream_history returns empty: the khora_dream_runs checkpoint table is Postgres-only, so the embedded path tracks run state through the dream report sink instead.

Production vs. embedded

VectorCypher needs all three roles: PostgreSQL (relational), pgvector (vector), and a graph store. It is production-ready on the PostgreSQL + pgvector + Neo4j stack. sqlite_lance fills all three roles in-process and is the right choice for local development, evaluation, and CI, but it carries the scale ceiling and gaps noted above. Treat it as a convenience, not a production deployment. See Configuration for the install extras behind each stack.

Coordinator, transactions, and bulk load

StorageCoordinator: the single entry point. Every method takes the caller’s namespace_id and filters at the query layer. The per-role attributes (coordinator.graph, etc.) are deprecated in favour of the facade. See Namespaces & isolation for the full contract.
Transactions: async with coordinator.transaction() as txn: shares one database session across multi-backend writes, with txn.savepoint() for partial rollback.
Bulk mode: StorageSettings(bulk_mode=True) defers HNSW index creation and relaxes Neo4j validation for initial loads. Call ensure_hnsw_indexes(...) afterward to rebuild. It trades consistency for throughput (for initial loading, not steady-state).

Data model

What’s stored in those backends: documents, chunks, entities, relationships, events.

Configuration

Every KHORA_* storage knob and the full install-extras table.

Getting started

Concepts

Operations

Experimental Features

Integrations

Reference

Examples

Storage Backends

The production stack: PostgreSQL + pgvector + Neo4j

PostgreSQL: the record keeper

pgvector: the meaning finder

Neo4j: the connector

Dual entity storage

The simpler path: `sqlite_lance`

Production vs. embedded

Coordinator, transactions, and bulk load

Data model

Configuration

​The production stack: PostgreSQL + pgvector + Neo4j

​PostgreSQL: the record keeper

​pgvector: the meaning finder

​Neo4j: the connector

​Dual entity storage

​The simpler path: sqlite_lance

​Production vs. embedded

​Coordinator, transactions, and bulk load

Data model

Configuration

The production stack: PostgreSQL + pgvector + Neo4j

PostgreSQL: the record keeper

pgvector: the meaning finder

Neo4j: the connector

Dual entity storage

The simpler path: `sqlite_lance`

Production vs. embedded

Coordinator, transactions, and bulk load