Ingestion

When you call remember(), Khora runs a three-phase pipeline: it checks whether the content is new, splits and analyzes it, and optionally connects it to what’s already stored. This page is the conceptual tour of that write path. For exact signatures see the API reference.

content ─▶ ① STAGING ─────▶ ② ENRICHMENT ──────────▶ ③ EXPANSION (optional)
           dedup +           chunk → embed ∥ extract     unify entities +
           doc record        → store                     infer relationships

Phase 1: Staging

Before any expensive work, Khora asks “have we seen this?” It computes a SHA-256 checksum of the content and skips the document if that checksum already exists in the namespace, so re-uploading the same content (even under a new title) is a no-op. It also resolves the source timestamp (when the content originated, not when it was ingested), which powers temporal recall and becomes each chunk’s occurred_at. Set it explicitly (recommended). Your connector knows its source, so resolve the one meaningful instant and pass it. It’s unambiguous, and it takes precedence over the fallback below (ISO-8601 strings are coerced for you):

await kb.remember(content, namespace=ns, source_timestamp="2026-05-13T14:00:00Z", ...)

Or rely on the metadata fallback. If you omit source_timestamp, Khora scans the document’s metadata for a set of recognized timestamps field and takes the first present: sent_at → created_at → timestamp → date → occurred_at → started_at → updated_at. This is a heuristic shortcut. Khora doesn’t know the data came from Slack or a calendar, it only maps the field names the connector wrote, so set the one that fits your source (sent_at for Slack/Gmail, occurred_at for calendar/events, created_at for issues). Matching is case/separator-insensitive (occurredAt / occurred-at / OCCURRED_AT all resolve to occurred_at, and an exact snake_case key wins on a tie). The one special-case: setting source_type to calendar / meeting / event in metadata flips the order to prefer event time (occurred_at) over dispatch time (sent_at).

Custom metadata and provenance

metadata is a free-form per-document dict for anything your application needs to carry alongside the content. Khora stores it on the document and denormalizes it onto every chunk, so you can gate recall on it later with a recall filter, down to a nested field by dotted path (filter={"metadata.team": "ingest"}). The provenance kwargs you set at ingest map to the filterable system keys: source_name, source_type, source_url, source, title, external_id, and source_timestamp.

await kb.remember(
    content,
    namespace=ns,
    source_name="linear",                 # later: filter={"source_name": "linear"}
    source_type="ticket",
    metadata={"team": "ingest", "tier": "gold", "priority": 1},
    entity_types=["EVENT"],
    relationship_types=["RELATES_TO"],
)

Phase 2: Enrichment

This is where content becomes knowledge. Khora uses a staged batch architecture: every document is chunked first, then embedding and extraction run concurrently (asyncio.gather: extraction doesn’t need embeddings and vice versa), then results are written in batches.

Chunking

Documents are split into focused, embeddable pieces. Pick a strategy with the chunk_strategy kwarg. Size and overlap are set globally via KhoraConfig.pipelines.chunk_size / chunk_overlap (defaults 512 / 50 tokens):

Strategy	How it splits	Best for
`fixed`	By token count	Predictable sizing
`semantic` (default)	At sentence boundaries (optional spaCy via `khora[nlp]`)	Natural language
`recursive`	Paragraphs → lines → sentences	Structured docs
`conversation`	Groups related messages, preserves speaker/thread	Chat logs (Slack)

Each chunk records which chunker produced it in chunker_info.

Embedding

Each chunk is converted to a vector (default text-embedding-3-small, 1536-dim) via LiteLLM, so any provider works. Embedding is batched (up to ~200 texts per call, sub-batches running concurrently), not one API call per chunk.

Extraction

An LLM reads each chunk and extracts entities and relationships. Two kwargs are required on every remember(). They tell the extractor which types to look for:

await kb.remember(
    content,
    namespace=ns,
    entity_types=["PERSON", "ORG"],          # required
    relationship_types=["WORKS_AT"],         # required
)

The type lists are guidance, not a hard schema, and empty lists don’t disable extraction. The extractor takes them as a strong hint but may still emit types outside your list, and passing entity_types=[] / relationship_types=[] removes the guidance entirely: Khora falls back to unbounded extraction and infers its own taxonomy from the content (model-chosen, loosely-cased labels, e.g. Person and EVENT side by side, plus self-invented relationship types).

For a domain-specific ontology (a custom system prompt, typed entities with cross-source dedup, and inference rules), pass an ExpertiseConfig via expertise=. See Expertise & ontologies for how to build and apply one, or the resume-search workload for a runnable example. This way, you can restrict LLM freedom in what it extracts and treat ontologies more like a hard schema then guidance.

Selective extraction (cost control). By default Khora doesn’t send every chunk to the LLM. On the default VectorCypher engine, skeleton PageRank ranks chunks by keyword-graph importance and sends the top skeleton_core_ratio fraction (default 0.50) to full LLM extraction. The chunks it skips get no graph edges, but stay retrievable through the vector (and, when enabled, keyword) channels. The generic ingest_documents() path uses a different selector: a ChunkImportanceScorer scoring entity density (35%), information density (25%), position (20%), and length (20%), which keeps the top extraction_importance_ratio (default 0.7) and gives the rest lightweight CO_OCCURS_WITH edges. Tune the VectorCypher selectivity via skeleton_core_ratio (see VectorCypher).

Co-occurrence edges

When the LLM extracts several entities from a chunk but doesn’t state an explicit relationship between two of them, Khora still links them with a weak co-occurrence edge, on the assumption that things mentioned together are usually related. The point is connectivity: VectorCypher answers questions by traversing relationships, so an entity with no edges is invisible to graph recall. Co-occurrence makes sure nothing is left stranded. It also captures the implicit “these were discussed together” links the model didn’t bother to name. There are two kinds, depending on where they come from:

Edge	Added between	Why
`ASSOCIATED_WITH`	Extracted entities that share a chunk (confidence `0.4`, ≤15 per chunk)	Densify the graph so no entity ends up isolated
`CO_OCCURS_WITH`	Entities in skipped chunks on the generic `ingest_documents()` path only (on the default VectorCypher engine, skipped chunks get no edges)	Keep those entities connected without paying for an LLM call

Both are deliberately low-confidence, so they never outrank real, LLM-extracted relationships. The dream phase down-weights them (to 0.2) and can prune them. Keep it on (the default) when you want forgiving recall over messy, real-world data: better to over-connect and let related-but-unstated entities surface than to miss a link. Turn it down when you want a graph that mirrors your ontology cleanly. On VectorCypher the ASSOCIATED_WITH densification is always on (there is no config switch), but you can:

prune low-confidence / co-occurrence edges in the dream phase (KHORA_DREAM_OPS_PRUNE_EDGES) when “edge soup” starts to hurt retrieval, and
drop the separate event edges (EVENT entities + PARTICIPATED_IN) with VectorCypherConfig(store_events=False). See the ontology example for a worked run.

Chunks land in pgvector, entities in Neo4j and pgvector (parallel writes), relationships in Neo4j, all batched. See Storage backends for where each lives.

Phase 3: Expansion (optional)

After enrichment, Khora can connect the new content to the existing graph:

Entity unification: the same entity written different ways (“Microsoft Corporation”, “Microsoft”, “MSFT”) is merged via exact, fuzzy (edit-distance), and embedding matching.
Relationship inference: new edges derived from existing ones (Alice and Bob both WORKS_FOR Acme → Alice COLLEAGUE_OF Bob), driven by the expertise config.

Four inference modes trade thoroughness for cost:

Mode	When it runs	Use case
`smart` (default)	Per-doc dedup during ingest, one full resolution pass after	Large imports, production
`incremental`	Per document against the existing graph	Small graphs, trickle feeds
`batch`	Once over the full graph after all docs	Legacy bulk imports
`none`	Never (unification only)	Fastest

Three ways to ingest

Method	Shape	Use it when
`remember()`	One document, awaits completion	Single writes
`remember_batch()`	Many documents, concurrent, cross-document dedup, blocks until done	Bulk imports where you wait for the result
`submit_batch()`	Stages docs as `PENDING`, returns a `BatchHandle` immediately; a background processor fires `on_result` per doc	Fire-and-forget / write-path services

Calling remember() in a loop misses cross-document entity dedup. remember_batch() shares one EntityIndex across the whole set, so “Microsoft” in doc 1 and “MSFT” in doc 50 collapse to one entity. Prefer it for bulk work.submit_batch() requires the pending processor to be running. Call kb.start_pending_processor() after connect(), or it raises (after staging the rows).

Error handling

A failure in one document never fails the batch. Errors are captured per document and surfaced in the result counts:

result = await kb.remember_batch(documents, namespace=ns, entity_types=[...], relationship_types=[...])
print(f"{result.processed} processed, {result.skipped} skipped (dupes), {result.failed} failed")

embedding_model and extraction_model are not per-call kwargs. Set them at construction time via KhoraConfig (KHORA_LLM_EMBEDDING_MODEL, KHORA_LLM_MODEL). The per-call kwargs are chunk_size, chunk_strategy, entity_types, relationship_types, expertise, source_timestamp, metadata, session_id, and external_id. Passing chunk_size=None falls back to KHORA_PIPELINES_CHUNK_SIZE.

Retrieval

The read path: how recall() finds and ranks what you ingested.

Core APIs example

Runnable remember_batch, ontology config, and entity reads.

Getting started

Concepts

Operations

Experimental Features

Integrations

Reference

Examples

Phase 1: Staging

Custom metadata and provenance

Phase 2: Enrichment

Chunking

Embedding

Extraction

Co-occurrence edges

Phase 3: Expansion (optional)

Three ways to ingest

Error handling

Retrieval

Core APIs example

​Phase 1: Staging

​Custom metadata and provenance

​Phase 2: Enrichment

​Chunking

​Embedding

​Extraction

​Co-occurrence edges

​Phase 3: Expansion (optional)

​Three ways to ingest

​Error handling

Retrieval

Core APIs example

Phase 1: Staging

Custom metadata and provenance

Phase 2: Enrichment

Chunking

Embedding

Extraction

Co-occurrence edges

Phase 3: Expansion (optional)

Three ways to ingest

Error handling