Skip to main content
Everything you store in Khora fits into three layers: a tenancy layer that isolates datasets, a content layer that holds what you ingested and what was extracted from it, and an event layer that records every change.
LayerWhat it holdsModels
TenancyThe sole isolation boundaryNamespace: a stable namespace_id plus a per-version row id
ContentWhat you ingested, and what was extracted from itDocumentChunkEntity, plus Relationship (entity-to-entity edges) and Episode
EventAn immutable log of everything that happensMemoryEvent
The tenancy layer is covered in Namespaces & isolation; where the rows physically live is covered in Storage backends. This page is the content and event model.

Content models

Document

The raw content you store, the starting point for everything. A remember() call creates one document, then chunks, embeds, and extracts from it.
FieldMeaning
contentThe actual text
checksumSHA-256 of the content, for dedup within a namespace
statusPENDING → PROCESSING → COMPLETED / FAILED
title, source, source_url, author, languageProvenance metadata
external_idCaller-supplied id for upserts / idempotency
session_idOptional conversation handle (powers forget_session and session GC)
source_timestampEvent time: populates occurred_at, which feeds recency scoring
chunk_count, entity_count, relationship_countSummary stats after processing
Documents move through a lifecycle: PENDING (created, queued) → PROCESSING (chunking/embedding/extraction) → COMPLETED or FAILED.

Chunk

Document pieces optimized for embedding and retrieval. Each chunk carries the vector that semantic search runs against.
FieldMeaning
contentThe chunk text
embeddingVector (1536-dim by default; halfvec/float16 optional)
embedding_modelWhich model produced the vector
chunk_index, start_char, end_charPosition in the parent document
token_countChunk size in tokens
occurred_atEvent time, propagated from the document’s source_timestamp
chunk.score on a recall result is a normalized rank within that result, not a raw similarity. For confidence, read result.engine_info["max_raw_vector_score"] (see Core APIs).

Entity

A named concept extracted from your content.
FieldMeaning
name, entity_typeThe entity and its type (a free string, not a fixed enum)
description, attributes, aliasesExtracted detail
embeddingVector for entity similarity search
confidenceExtraction confidence (0–1)
mention_countHow often it was seen
valid_from, valid_untilReal-world validity window
source_document_ids, source_chunk_idsProvenance: where it was learned
Khora doesn’t enforce a taxonomy. entity_type and relationship_type are free strings you supply via the required entity_types / relationship_types arguments on every remember(), or through a richer ExpertiseConfig. Common conventions are PERSON, ORGANIZATION, LOCATION, PRODUCT, CONCEPT, EVENT, TECHNOLOGY, but the names are yours. See the ontology example.

Relationship

A typed, directed edge between two entities.
FieldMeaning
source_entity_idtarget_entity_idThe edge direction
relationship_typeA free string (e.g. WORKS_FOR, KNOWS, PART_OF, LOCATED_IN, DEPENDS_ON)
propertiesArbitrary edge context
weight, confidenceStrength and extraction confidence
valid_from, valid_untilReal-world validity window
source_document_ids, source_chunk_idsProvenance

Episode

An event with temporal extent. It connects multiple entities to a point or span in time (occurred_at, duration_seconds, entity_ids). Useful for “what happened, when, and who was involved.”

The source chain

Every entity and relationship remembers where it came from, via source_document_ids and source_chunk_ids:
  • Document “Meeting Notes” is split into Chunk #1, #2, #3.
  • Entity “Alice” is extracted from chunks 1–3. Its source_chunk_ids point back to all three.
  • Relationship “Alice WORKS_FOR Acme” records the same source chunks.
This is what makes provenance (“where did we learn this?”), citation (“here’s the source”), and cascading cleanup work: forget(document_id) removes the document and updates the entities and relationships that referenced it.

Bi-temporal time

Khora separates two notions of time, which is what lets it answer “what did we believe then?” as well as “what changed?”:
  • Event/validity time: source_timestamp / occurred_at on chunks, and valid_from / valid_until on entities and relationships (when something was true in the real world).
  • System time: created_at (never changes) and updated_at (last modified).
The dream phase adds an invalidation layer on top: relationships and facts carry invalidated_at / invalidated_by, so a superseded row is soft-deleted (kept for audit) rather than destroyed.

Event layer

MemoryEvent

Every change is recorded as an immutable event, an append-only audit trail.
FieldMeaning
event_typee.g. document.created, entity.merged, relationship.inferred
resource_type, resource_idWhat the event is about
data, previous_dataNew (and prior) state
actor_id, actor_typeWho triggered it (user / system / api / pipeline)
correlation_idTies together every event from one operation
timestampWhen it happened
Event types span the lifecycle of each resource:
ResourceEvent types
Documentcreated, updated, deleted, processing_started/completed/failed
Chunkcreated, deleted, embedding_generated
Entitycreated, updated, deleted, merged
Relationshipcreated, updated, deleted, inferred
Namespacecreated, activated, archived
Correlation IDs make the log queryable as a causal chain: one remember() call emits a document event, several chunk events, and many entity/relationship events, all sharing one correlation_id, so “what happened as a result of X?” is one query.

How it fits together

  • Namespace: the container for everything below.
    • Document → many Chunks. Extraction over its chunks yields Entitys.
      • EntityRelationshipEntity: typed, directed edges between entities.
      • Entity → participates in → Episode: an event with temporal extent.
    • MemoryEvent → records every change to all of the above.
database

Storage backends

Where these rows physically live: PostgreSQL + pgvector + Neo4j, or the embedded sqlite_lance stack.
lock

Namespaces & isolation

The tenancy layer: the dual-ID scheme, isolation contract, and versioning.