remember(), Khora runs a three-phase pipeline: it checks whether the
content is new, splits and analyzes it, and optionally connects it to what’s already
stored. This page is the conceptual tour of that write path. For exact signatures see
the API reference.
Phase 1: Staging
Before any expensive work, Khora asks “have we seen this?” It computes a SHA-256 checksum of the content and skips the document if that checksum already exists in the namespace, so re-uploading the same content (even under a new title) is a no-op. It also resolves the source timestamp (when the content originated, not when it was ingested), which powers temporal recall and becomes each chunk’soccurred_at.
Set it explicitly (recommended). Your connector knows its source, so resolve the
one meaningful instant and pass it. It’s unambiguous, and it takes precedence over the
fallback below (ISO-8601 strings are coerced for you):
source_timestamp, Khora scans the
document’s metadata for a set of recognized timestamps field and takes the first present:
sent_at → created_at → timestamp → date → occurred_at → started_at → updated_at.
This is a heuristic shortcut. Khora doesn’t know the data came from Slack or a
calendar, it only maps the field names the connector wrote, so set the one that fits
your source (sent_at for Slack/Gmail, occurred_at for calendar/events, created_at
for issues). Matching is case/separator-insensitive (occurredAt / occurred-at /
OCCURRED_AT all resolve to occurred_at, and an exact snake_case key wins on a tie).
The one special-case: setting source_type to calendar / meeting / event in
metadata flips the order to prefer event time (occurred_at) over dispatch time
(sent_at).
Phase 2: Enrichment
This is where content becomes knowledge. Khora uses a staged batch architecture: every document is chunked first, then embedding and extraction run concurrently (asyncio.gather: extraction doesn’t need embeddings and vice versa), then results
are written in batches.
Chunking
Documents are split into focused, embeddable pieces. Pick a strategy with thechunk_strategy kwarg. Size and overlap are set globally via
KhoraConfig.pipelines.chunk_size / chunk_overlap (defaults 512 / 50 tokens):
| Strategy | How it splits | Best for |
|---|---|---|
fixed | By token count | Predictable sizing |
semantic (default) | At sentence boundaries (optional spaCy via khora[nlp]) | Natural language |
recursive | Paragraphs → lines → sentences | Structured docs |
conversation | Groups related messages, preserves speaker/thread | Chat logs (Slack) |
chunker_info.
Embedding
Each chunk is converted to a vector (defaulttext-embedding-3-small, 1536-dim) via
LiteLLM, so any provider works. Embedding is batched (up to ~200 texts per call,
sub-batches running concurrently), not one API call per chunk.
Extraction
An LLM reads each chunk and extracts entities and relationships. Two kwargs are required on everyremember(). They tell the extractor which types to look for:
ExpertiseConfig via expertise=.
See Expertise & ontologies for how to build and apply one,
or the resume-search workload for a runnable example. This way, you can restrict LLM freedom in what it extracts and treat ontologies more like a hard schema then guidance.
Selective extraction (cost control). By default Khora doesn’t send every chunk to
the LLM. A
ChunkImportanceScorer ranks chunks on entity density (35%), information
density (25%), position (20%), and length (20%); only the top
extraction_importance_ratio (default 0.7) get full LLM extraction, while the rest
get lightweight CO_OCCURS_WITH co-occurrence edges. That trims LLM cost ~30–50% with
minimal recall loss. VectorCypher tunes this further via skeleton_core_ratio (see
VectorCypher).Co-occurrence edges
When the LLM extracts several entities from a chunk but doesn’t state an explicit relationship between two of them, Khora still links them with a weak co-occurrence edge, on the assumption that things mentioned together are usually related. The point is connectivity: VectorCypher answers questions by traversing relationships, so an entity with no edges is invisible to graph recall. Co-occurrence makes sure nothing is left stranded. It also captures the implicit “these were discussed together” links the model didn’t bother to name. There are two kinds, depending on where they come from:| Edge | Added between | Why |
|---|---|---|
ASSOCIATED_WITH | Extracted entities that share a chunk (confidence 0.4, ≤15 per chunk) | Densify the graph so no entity ends up isolated |
CO_OCCURS_WITH | Entities in chunks that selective extraction skipped (never sent to the LLM) | Keep those entities connected without paying for an LLM call |
0.2) and can
prune them.
Keep it on (the default) when you want forgiving recall over messy, real-world data:
better to over-connect and let related-but-unstated entities surface than to miss a link.
Turn it down when you want a graph that mirrors your ontology cleanly. On VectorCypher
the ASSOCIATED_WITH densification is always on (there is no config switch), but you can:
- prune low-confidence / co-occurrence edges in the dream phase
(
KHORA_DREAM_OPS_PRUNE_EDGES) when “edge soup” starts to hurt retrieval, and - drop the separate event edges (
EVENTentities +PARTICIPATED_IN) withVectorCypherConfig(store_events=False). See the ontology example for a worked run.
Phase 3: Expansion (optional)
After enrichment, Khora can connect the new content to the existing graph:- Entity unification: the same entity written different ways (“Microsoft Corporation”, “Microsoft”, “MSFT”) is merged via exact, fuzzy (edit-distance), and embedding matching.
- Relationship inference: new edges derived from existing ones (Alice and Bob both
WORKS_FORAcme → AliceCOLLEAGUE_OFBob), driven by the expertise config.
| Mode | When it runs | Use case |
|---|---|---|
smart (default) | Per-doc dedup during ingest, one full resolution pass after | Large imports, production |
incremental | Per document against the existing graph | Small graphs, trickle feeds |
batch | Once over the full graph after all docs | Legacy bulk imports |
none | Never (unification only) | Fastest |
Three ways to ingest
| Method | Shape | Use it when |
|---|---|---|
remember() | One document, awaits completion | Single writes |
remember_batch() | Many documents, concurrent, cross-document dedup, blocks until done | Bulk imports where you wait for the result |
submit_batch() | Stages docs as PENDING, returns a BatchHandle immediately; a background processor fires on_result per doc | Fire-and-forget / write-path services |
Error handling
A failure in one document never fails the batch. Errors are captured per document and surfaced in the result counts:chunk_size, embedding_model, and extraction_model are not per-call kwargs.
Set them at construction time via KhoraConfig (KHORA_PIPELINES_CHUNK_SIZE,
KHORA_LLM_EMBEDDING_MODEL, KHORA_LLM_MODEL). The per-call kwargs are chunk_strategy,
entity_types, relationship_types, expertise, source_timestamp, metadata,
session_id, and external_id.search
Retrieval
The read path: how
recall() finds and ranks what you ingested.code
Core APIs example
Runnable
remember_batch, ontology config, and entity reads.