Data model

Everything you store in Khora fits into three layers: a tenancy layer that isolates datasets, a content layer that holds what you ingested and what was extracted from it, and an event layer that records every change.

Layer	What it holds	Models
Tenancy	The sole isolation boundary	`Namespace`: a stable `namespace_id` plus a per-version row `id`
Content	What you ingested, and what was extracted from it	`Document` → `Chunk` → `Entity`, plus `Relationship` (entity-to-entity edges) and `Episode`
Event	An immutable log of everything that happens	`MemoryEvent`

The tenancy layer is covered in Namespaces & isolation; where the rows physically live is covered in Storage backends. This page is the content and event model.

Content models

Document

The raw content you store, the starting point for everything. A remember() call creates one document, then chunks, embeds, and extracts from it.

Field	Meaning
`content`	The actual text
`checksum`	SHA-256 of the content, for dedup within a namespace
`status`	`PENDING → PROCESSING → COMPLETED` / `FAILED`, and `ARCHIVED`
`title`, `source`, `source_url`, `author`, `language`	Provenance metadata
`external_id`	Caller-supplied id for upserts / idempotency
`session_id`	Optional conversation handle (powers `forget_session` and session GC)
`source_timestamp`	Event time: populates `occurred_at`, which feeds recency scoring
`chunk_count`, `entity_count`, `relationship_count`	Summary stats after processing

Documents move through a lifecycle: PENDING (created, queued) → PROCESSING (chunking/embedding/extraction) → COMPLETED or FAILED. A document can also be ARCHIVED: retained but excluded from active processing (re-ingest under the same external_id skips it unless you pass reprocess_archived=True).

Chunk

Document pieces optimized for embedding and retrieval. Each chunk carries the vector that semantic search runs against.

Field	Meaning
`content`	The chunk text
`embedding`	Vector (1536-dim by default; `halfvec`/float16 optional)
`embedding_model`	Which model produced the vector
`chunk_index`, `start_char`, `end_char`	Position in the parent document
`token_count`	Chunk size in tokens
`occurred_at`	Event time, propagated from the document’s `source_timestamp`

chunk.score on a recall result is a normalized rank within that result, not a raw similarity. For confidence, read result.engine_info["max_raw_vector_score"] (see Core APIs).

Entity

A named concept extracted from your content.

Field	Meaning
`name`, `entity_type`	The entity and its type (a free string, not a fixed enum)
`description`, `attributes`	Extracted detail
`embedding`	Vector for entity similarity search
`confidence`	Extraction confidence (0–1)
`mention_count`	How often it was seen
`valid_from`, `valid_until`	Real-world validity window
`source_document_ids`, `source_chunk_ids`	Provenance: where it was learned
`metadata`	Free-form extras. Entity aliases collected during resolution live here, under `metadata["aliases"]`.

Khora doesn’t enforce a taxonomy. entity_type and relationship_type are free strings you supply via the required entity_types / relationship_types arguments on every remember(), or through a richer ExpertiseConfig. Common conventions are PERSON, ORGANIZATION, LOCATION, PRODUCT, CONCEPT, EVENT, TECHNOLOGY, but the names are yours. See the ontology example.

Relationship

A typed, directed edge between two entities.

Field	Meaning
`source_entity_id` → `target_entity_id`	The edge direction
`relationship_type`	A free string (e.g. `WORKS_FOR`, `KNOWS`, `PART_OF`, `LOCATED_IN`, `DEPENDS_ON`)
`properties`	Arbitrary edge context
`weight`, `confidence`	Strength and extraction confidence
`valid_from`, `valid_until`	Real-world validity window
`source_document_ids`, `source_chunk_ids`	Provenance

Episode

An event with temporal extent. It connects multiple entities to a point or span in time (occurred_at, duration_seconds, entity_ids). Useful for “what happened, when, and who was involved.”

The source chain

Every entity and relationship remembers where it came from, via source_document_ids and source_chunk_ids:

Document “Meeting Notes” is split into Chunk #1, #2, #3.
Entity “Alice” is extracted from chunks 1–3. Its source_chunk_ids point back to all three.
Relationship “Alice WORKS_FOR Acme” records the same source chunks.

This is what makes provenance (“where did we learn this?”), citation (“here’s the source”), and cascading cleanup work: forget(document_id) removes the document and updates the entities and relationships that referenced it.

Bi-temporal time

Khora separates two notions of time, which is what lets it answer “what did we believe then?” as well as “what changed?”:

Event/validity time: source_timestamp / occurred_at on chunks, and valid_from / valid_until on entities and relationships (when something was true in the real world).
System time: created_at (never changes) and updated_at (last modified).

The dream phase adds an invalidation layer on top: relationships and facts carry invalidated_at / invalidated_by, so a superseded row is soft-deleted (kept for audit) rather than destroyed.

Event layer

MemoryEvent

Every change is recorded as an immutable event, an append-only audit trail.

Field	Meaning
`event_type`	e.g. `document.created`, `entity.merged`, `relationship.created`
`resource_type`, `resource_id`	What the event is about
`data`, `previous_data`	New (and prior) state
`actor_id`, `actor_type`	Who triggered it (`user` / `system` / `api` / `pipeline`)
`correlation_id`	Ties together every event from one operation
`timestamp`	When it happened

Event types span the lifecycle of each resource:

Resource	Event types
Document	`created`, `updated`, `deleted`, `processed`, `failed`
Chunk	`created`, `embedded`, `deleted` (plus `entities_resolved`)
Entity	`created`, `updated`, `merged`, `deleted`
Relationship	`created`, `updated`, `deleted`
Namespace	`created`, `updated`, `deleted`

Correlation IDs make the log queryable as a causal chain: one remember() call emits a document event, several chunk events, and many entity/relationship events, all sharing one correlation_id, so “what happened as a result of X?” is one query.

How it fits together

Namespace: the container for everything below.
- Document → many Chunks. Extraction over its chunks yields Entitys.
  - Entity → Relationship → Entity: typed, directed edges between entities.
  - Entity → participates in → Episode: an event with temporal extent.
- MemoryEvent → records every change to all of the above.

Storage backends

Where these rows physically live: PostgreSQL + pgvector + Neo4j, or the embedded sqlite_lance stack.

Namespaces & isolation

The tenancy layer: the dual-ID scheme, isolation contract, and versioning.

Getting started

Concepts

Operations

Experimental Features

Integrations

Reference

Examples

Content models

Document

Chunk

Entity

Relationship

Episode

The source chain

Bi-temporal time

Event layer

MemoryEvent

How it fits together

Storage backends

Namespaces & isolation

​Content models

​Document

​Chunk

​Entity

​Relationship

​Episode

​The source chain

​Bi-temporal time

​Event layer

​MemoryEvent

​How it fits together

Storage backends

Namespaces & isolation

Content models

Document

Chunk

Entity

Relationship

Episode

The source chain

Bi-temporal time

Event layer

MemoryEvent

How it fits together