Skip to main content
This walks the production path: PostgreSQL + pgvector + Neo4j with the default VectorCypher engine. For embedded options (SQLite + LanceDB), see Configuration.
1

Install Khora

uv init khora-quickstart 
cd khora-quickstart
uv add khora
The Khora library requires Python version 3.13+ 
2

Setup databases

As Khora is meant to be used as a library, it has no built-in server and is made to work with existing databases.For local development, the Khora repo ships a compose.yaml that get’s you up and running with our recommended stack: Postgres (with pgvector) and Neo4j 
git clone https://github.com/DeytaHQ/khora
cd khora
make dev
Then export connection URLs via environment variables in .env
KHORA_DATABASE_URL=postgresql://khora:khora@localhost:5434/khora
KHORA_NEO4J_URL=bolt://neo4j:pleaseletmein@localhost:7688
OPENAI_API_KEY=sk-...
3

Run migrations

Khora ships its schema as Alembic migrations. Run them once per database:
uv run alembic upgrade head
Or instantiate Khora(..., run_migrations=True) to apply on connect under an advisory lock, useful for single-process apps and tests.
4

Store a memory

import asyncio
from khora import Khora

async def main() -> None:
    async with Khora() as kb:  # reads KHORA_DATABASE_URL / KHORA_NEO4J_URL
        ns = await kb.create_namespace()
        await kb.remember(
            "Marie Curie won the Nobel Prize in Physics in 1903.",
            namespace=ns.namespace_id,
            entity_types=["PERSON", "ORGANIZATION", "CONCEPT", "LOCATION", "EVENT"],
            relationship_types=["RELATES_TO", "PART_OF", "MENTIONS"],
        )
        result = await kb.recall(
            "What did Curie win?",
            namespace=ns.namespace_id,
        )
        for chunk in result.chunks:
            print(chunk.content, chunk.score)

asyncio.run(main())
remember() runs the 3-phase ingestion pipeline (stage → enrich → expand). recall() returns a RecallResult projection with chunks (typed RecallChunk), entities, relationships, and documents (deduplicated source documents). Build prompt context by iterating result.chunks. Each carries .content, .score, .id, and .document_id.
5

Batch ingestion (optional)

For higher throughput, stage documents and let a background processor pick them up:
async with Khora() as kb:
    kb.start_pending_processor()   # opt-in; write-path services only
    handle = await kb.submit_batch(
        [{"content": "doc 1"}, {"content": "doc 2"}],
        on_result=lambda completed, total, result: print(result),
        namespace=ns.namespace_id,
        entity_types=["PERSON", "ORGANIZATION", "CONCEPT", "LOCATION", "EVENT"],
        relationship_types=["RELATES_TO", "PART_OF", "MENTIONS"],
    )
    await handle.wait()
The processor is opt-in. Read-only services don’t need it.
Pre-fetch the reranker model. Reranking is on by default, so the first recall() that runs it downloads the cross-encoder BAAI/bge-reranker-v2-m3 from Hugging Face (a couple of gigabytes). Pre-fetch it so that first query doesn’t pay the download cost, using the hf CLI:
pip install -U "huggingface_hub"   # provides the hf CLI
hf auth login                      # optional: authenticated, rate-limit-free downloads
hf download BAAI/bge-reranker-v2-m3
Pin a different model with KHORA_QUERY_RERANKING_MODEL, or turn reranking off with KHORA_QUERY_ENABLE_RERANKING=false.

Next steps

tune

Configuration

Every KHORA_* knob: storage, LLM, pipeline, query, telemetry.
settings_input_component

VectorCypher

How the retrieval engine fuses vector, graph, and keyword search with query routing and RRF.