Skip to main content
When you call remember(), Khora runs a three-phase pipeline: it checks whether the content is new, splits and analyzes it, and optionally connects it to what’s already stored. This page is the conceptual tour of that write path. For exact signatures see the API reference.
content ─▶ ① STAGING ─────▶ ② ENRICHMENT ──────────▶ ③ EXPANSION (optional)
           dedup +           chunk → embed ∥ extract     unify entities +
           doc record        → store                     infer relationships

Phase 1: Staging

Before any expensive work, Khora asks “have we seen this?” It computes a SHA-256 checksum of the content and skips the document if that checksum already exists in the namespace, so re-uploading the same content (even under a new title) is a no-op. It also resolves the source timestamp (when the content originated, not when it was ingested), which powers temporal recall and becomes each chunk’s occurred_at. Set it explicitly (recommended). Your connector knows its source, so resolve the one meaningful instant and pass it. It’s unambiguous, and it takes precedence over the fallback below (ISO-8601 strings are coerced for you):
await kb.remember(content, namespace=ns, source_timestamp="2026-05-13T14:00:00Z", ...)
Or rely on the metadata fallback. If you omit source_timestamp, Khora scans the document’s metadata for a set of recognized timestamps field and takes the first present: sent_atcreated_attimestampdateoccurred_atstarted_atupdated_at. This is a heuristic shortcut. Khora doesn’t know the data came from Slack or a calendar, it only maps the field names the connector wrote, so set the one that fits your source (sent_at for Slack/Gmail, occurred_at for calendar/events, created_at for issues). Matching is case/separator-insensitive (occurredAt / occurred-at / OCCURRED_AT all resolve to occurred_at, and an exact snake_case key wins on a tie). The one special-case: setting source_type to calendar / meeting / event in metadata flips the order to prefer event time (occurred_at) over dispatch time (sent_at).

Phase 2: Enrichment

This is where content becomes knowledge. Khora uses a staged batch architecture: every document is chunked first, then embedding and extraction run concurrently (asyncio.gather: extraction doesn’t need embeddings and vice versa), then results are written in batches.

Chunking

Documents are split into focused, embeddable pieces. Pick a strategy with the chunk_strategy kwarg. Size and overlap are set globally via KhoraConfig.pipelines.chunk_size / chunk_overlap (defaults 512 / 50 tokens):
StrategyHow it splitsBest for
fixedBy token countPredictable sizing
semantic (default)At sentence boundaries (optional spaCy via khora[nlp])Natural language
recursiveParagraphs → lines → sentencesStructured docs
conversationGroups related messages, preserves speaker/threadChat logs (Slack)
Each chunk records which chunker produced it in chunker_info.

Embedding

Each chunk is converted to a vector (default text-embedding-3-small, 1536-dim) via LiteLLM, so any provider works. Embedding is batched (up to ~200 texts per call, sub-batches running concurrently), not one API call per chunk.

Extraction

An LLM reads each chunk and extracts entities and relationships. Two kwargs are required on every remember(). They tell the extractor which types to look for:
await kb.remember(
    content,
    namespace=ns,
    entity_types=["PERSON", "ORG"],          # required
    relationship_types=["WORKS_AT"],         # required
)
The type lists are guidance, not a hard schema, and empty lists don’t disable extraction. The extractor takes them as a strong hint but may still emit types outside your list, and passing entity_types=[] / relationship_types=[] removes the guidance entirely: Khora falls back to unbounded extraction and infers its own taxonomy from the content (model-chosen, loosely-cased labels, e.g. Person and EVENT side by side, plus self-invented relationship types).
For a domain-specific ontology (a custom system prompt, typed entities with cross-source dedup, and inference rules), pass an ExpertiseConfig via expertise=. See Expertise & ontologies for how to build and apply one, or the resume-search workload for a runnable example. This way, you can restrict LLM freedom in what it extracts and treat ontologies more like a hard schema then guidance.
Selective extraction (cost control). By default Khora doesn’t send every chunk to the LLM. A ChunkImportanceScorer ranks chunks on entity density (35%), information density (25%), position (20%), and length (20%); only the top extraction_importance_ratio (default 0.7) get full LLM extraction, while the rest get lightweight CO_OCCURS_WITH co-occurrence edges. That trims LLM cost ~30–50% with minimal recall loss. VectorCypher tunes this further via skeleton_core_ratio (see VectorCypher).

Co-occurrence edges

When the LLM extracts several entities from a chunk but doesn’t state an explicit relationship between two of them, Khora still links them with a weak co-occurrence edge, on the assumption that things mentioned together are usually related. The point is connectivity: VectorCypher answers questions by traversing relationships, so an entity with no edges is invisible to graph recall. Co-occurrence makes sure nothing is left stranded. It also captures the implicit “these were discussed together” links the model didn’t bother to name. There are two kinds, depending on where they come from:
EdgeAdded betweenWhy
ASSOCIATED_WITHExtracted entities that share a chunk (confidence 0.4, ≤15 per chunk)Densify the graph so no entity ends up isolated
CO_OCCURS_WITHEntities in chunks that selective extraction skipped (never sent to the LLM)Keep those entities connected without paying for an LLM call
Both are deliberately low-confidence, so they never outrank real, LLM-extracted relationships. The dream phase down-weights them (to 0.2) and can prune them. Keep it on (the default) when you want forgiving recall over messy, real-world data: better to over-connect and let related-but-unstated entities surface than to miss a link. Turn it down when you want a graph that mirrors your ontology cleanly. On VectorCypher the ASSOCIATED_WITH densification is always on (there is no config switch), but you can:
  • prune low-confidence / co-occurrence edges in the dream phase (KHORA_DREAM_OPS_PRUNE_EDGES) when “edge soup” starts to hurt retrieval, and
  • drop the separate event edges (EVENT entities + PARTICIPATED_IN) with VectorCypherConfig(store_events=False). See the ontology example for a worked run.
Chunks land in pgvector, entities in Neo4j and pgvector (parallel writes), relationships in Neo4j, all batched. See Storage backends for where each lives.

Phase 3: Expansion (optional)

After enrichment, Khora can connect the new content to the existing graph:
  • Entity unification: the same entity written different ways (“Microsoft Corporation”, “Microsoft”, “MSFT”) is merged via exact, fuzzy (edit-distance), and embedding matching.
  • Relationship inference: new edges derived from existing ones (Alice and Bob both WORKS_FOR Acme → Alice COLLEAGUE_OF Bob), driven by the expertise config.
Four inference modes trade thoroughness for cost:
ModeWhen it runsUse case
smart (default)Per-doc dedup during ingest, one full resolution pass afterLarge imports, production
incrementalPer document against the existing graphSmall graphs, trickle feeds
batchOnce over the full graph after all docsLegacy bulk imports
noneNever (unification only)Fastest

Three ways to ingest

MethodShapeUse it when
remember()One document, awaits completionSingle writes
remember_batch()Many documents, concurrent, cross-document dedup, blocks until doneBulk imports where you wait for the result
submit_batch()Stages docs as PENDING, returns a BatchHandle immediately; a background processor fires on_result per docFire-and-forget / write-path services
Calling remember() in a loop misses cross-document entity dedup. remember_batch() shares one EntityIndex across the whole set, so “Microsoft” in doc 1 and “MSFT” in doc 50 collapse to one entity. Prefer it for bulk work.submit_batch() requires the pending processor to be running. Call kb.start_pending_processor() after connect(), or it raises (after staging the rows).

Error handling

A failure in one document never fails the batch. Errors are captured per document and surfaced in the result counts:
result = await kb.remember_batch(documents, namespace=ns, entity_types=[...], relationship_types=[...])
print(f"{result.processed} processed, {result.skipped} skipped (dupes), {result.failed} failed")
chunk_size, embedding_model, and extraction_model are not per-call kwargs. Set them at construction time via KhoraConfig (KHORA_PIPELINES_CHUNK_SIZE, KHORA_LLM_EMBEDDING_MODEL, KHORA_LLM_MODEL). The per-call kwargs are chunk_strategy, entity_types, relationship_types, expertise, source_timestamp, metadata, session_id, and external_id.
search

Retrieval

The read path: how recall() finds and ranks what you ingested.
code

Core APIs example

Runnable remember_batch, ontology config, and entity reads.