| Layer | What it holds | Models |
|---|---|---|
| Tenancy | The sole isolation boundary | Namespace: a stable namespace_id plus a per-version row id |
| Content | What you ingested, and what was extracted from it | Document → Chunk → Entity, plus Relationship (entity-to-entity edges) and Episode |
| Event | An immutable log of everything that happens | MemoryEvent |
Content models
Document
The raw content you store, the starting point for everything. Aremember() call
creates one document, then chunks, embeds, and extracts from it.
| Field | Meaning |
|---|---|
content | The actual text |
checksum | SHA-256 of the content, for dedup within a namespace |
status | PENDING → PROCESSING → COMPLETED / FAILED |
title, source, source_url, author, language | Provenance metadata |
external_id | Caller-supplied id for upserts / idempotency |
session_id | Optional conversation handle (powers forget_session and session GC) |
source_timestamp | Event time: populates occurred_at, which feeds recency scoring |
chunk_count, entity_count, relationship_count | Summary stats after processing |
PENDING (created, queued) → PROCESSING
(chunking/embedding/extraction) → COMPLETED or FAILED.
Chunk
Document pieces optimized for embedding and retrieval. Each chunk carries the vector that semantic search runs against.| Field | Meaning |
|---|---|
content | The chunk text |
embedding | Vector (1536-dim by default; halfvec/float16 optional) |
embedding_model | Which model produced the vector |
chunk_index, start_char, end_char | Position in the parent document |
token_count | Chunk size in tokens |
occurred_at | Event time, propagated from the document’s source_timestamp |
chunk.score on a recall result is a normalized rank within that result, not a
raw similarity. For confidence, read result.engine_info["max_raw_vector_score"]
(see Core APIs).
Entity
A named concept extracted from your content.| Field | Meaning |
|---|---|
name, entity_type | The entity and its type (a free string, not a fixed enum) |
description, attributes, aliases | Extracted detail |
embedding | Vector for entity similarity search |
confidence | Extraction confidence (0–1) |
mention_count | How often it was seen |
valid_from, valid_until | Real-world validity window |
source_document_ids, source_chunk_ids | Provenance: where it was learned |
Khora doesn’t enforce a taxonomy.
entity_type and relationship_type are free
strings you supply via the required entity_types / relationship_types
arguments on every remember(), or through a richer ExpertiseConfig. Common
conventions are PERSON, ORGANIZATION, LOCATION, PRODUCT, CONCEPT,
EVENT, TECHNOLOGY, but the names are yours. See the
ontology example.Relationship
A typed, directed edge between two entities.| Field | Meaning |
|---|---|
source_entity_id → target_entity_id | The edge direction |
relationship_type | A free string (e.g. WORKS_FOR, KNOWS, PART_OF, LOCATED_IN, DEPENDS_ON) |
properties | Arbitrary edge context |
weight, confidence | Strength and extraction confidence |
valid_from, valid_until | Real-world validity window |
source_document_ids, source_chunk_ids | Provenance |
Episode
An event with temporal extent. It connects multiple entities to a point or span in time (occurred_at, duration_seconds, entity_ids). Useful for “what happened,
when, and who was involved.”
The source chain
Every entity and relationship remembers where it came from, viasource_document_ids and source_chunk_ids:
Document“Meeting Notes” is split intoChunk #1,#2,#3.Entity“Alice” is extracted from chunks 1–3. Itssource_chunk_idspoint back to all three.Relationship“AliceWORKS_FORAcme” records the same source chunks.
forget(document_id) removes the
document and updates the entities and relationships that referenced it.
Bi-temporal time
Khora separates two notions of time, which is what lets it answer “what did we believe then?” as well as “what changed?”:- Event/validity time:
source_timestamp/occurred_aton chunks, andvalid_from/valid_untilon entities and relationships (when something was true in the real world). - System time:
created_at(never changes) andupdated_at(last modified).
invalidated_at / invalidated_by, so a superseded
row is soft-deleted (kept for audit) rather than destroyed.
Event layer
MemoryEvent
Every change is recorded as an immutable event, an append-only audit trail.| Field | Meaning |
|---|---|
event_type | e.g. document.created, entity.merged, relationship.inferred |
resource_type, resource_id | What the event is about |
data, previous_data | New (and prior) state |
actor_id, actor_type | Who triggered it (user / system / api / pipeline) |
correlation_id | Ties together every event from one operation |
timestamp | When it happened |
| Resource | Event types |
|---|---|
| Document | created, updated, deleted, processing_started/completed/failed |
| Chunk | created, deleted, embedding_generated |
| Entity | created, updated, deleted, merged |
| Relationship | created, updated, deleted, inferred |
| Namespace | created, activated, archived |
remember() call
emits a document event, several chunk events, and many entity/relationship events,
all sharing one correlation_id, so “what happened as a result of X?” is one query.
How it fits together
Namespace: the container for everything below.Document→ manyChunks. Extraction over its chunks yieldsEntitys.Entity→Relationship→Entity: typed, directed edges between entities.Entity→ participates in →Episode: an event with temporal extent.
MemoryEvent→ records every change to all of the above.
database
Storage backends
Where these rows physically live: PostgreSQL + pgvector + Neo4j, or the
embedded sqlite_lance stack.
lock
Namespaces & isolation
The tenancy layer: the dual-ID scheme, isolation contract, and versioning.