Core components
| Component | Role | Where |
|---|---|---|
Khora | The facade: remember / recall / forget and the rest of the public API | API reference |
| StorageCoordinator | Routes every read/write to the right backend, scoped to a namespace. Runs dual writes in parallel. Offers transaction() | Storage backends |
| Ingestion pipeline | Staged batch write path: chunk → embed ∥ extract → store | Ingestion |
| Query engine | Multi-source read path: understand → search → fuse → rerank | Retrieval |
| VectorCypher | The retrieval engine: hybrid vector + graph + keyword search | VectorCypher |
How data flows
Writing (remember) runs the three-phase pipeline:
staging (dedup + document record), enrichment (chunk, then embed and extract
concurrently, then batch-store), and optional expansion (entity unification +
relationship inference). The same content is stored in multiple forms (text in
PostgreSQL, vectors in pgvector, the entity graph in Neo4j) because each backend
answers a different kind of question.
Reading (recall) runs the query pipeline: one LLM
call understands the query, vector/graph/keyword channels search in parallel, and
Reciprocal Rank Fusion combines the rankings before optional reranking.
Cross-cutting concerns
- Namespaces & isolation: the sole tenancy boundary, enforced
at the query layer on every backend method (
namespace_idis required, kwarg-only). - Event sourcing: every change is an immutable event, so nothing is silently lost. Audit, time-travel, and CDC fall out of it.
- Observability: Khora emits OpenTelemetry spans/metrics
unconditionally, and the host app chooses where they go. Khora never sets
service.nameor installs a provider at import time. - Configuration layers: env vars →
KhoraConfig→ per-namespace overrides, general to specific. See Configuration. - Protocol-based design: each storage role (relational, vector, graph) sits behind a
Protocol, so the embedded
sqlite_lancestack and the production PostgreSQL + pgvector + Neo4j stack share one code path.
Going deeper
speed
Performance & scaling
Bulk loading, connection pooling, batch operations, and the knobs that matter at scale.
bolt
Rust acceleration
The optional native layer for CPU-bound work, with automatic NumPy/Python fallback.