Event sourcing

Khora doesn’t just store current state. It records every change as an immutable event in an append-only log. Create a document, merge an entity, delete a chunk: each becomes a MemoryEvent. The log is the source of truth; current state is a view you could reconstruct by replaying it. This page is about using the event log. For the MemoryEvent fields themselves, see the data model; to react to events as they happen in real time, see semantic hooks.

What gets recorded

Every meaningful action across the resource lifecycle: document.created/updated/ deleted/processed/failed, chunk.created/embedded/deleted, entity.created/updated/ merged/deleted, relationship.created/inferred/deleted, namespace.*, and sync.started/completed/failed/checkpoint. Each event captures what happened (event_type, resource_type, resource_id), the details (data, and previous_data for updates), and who/when (actor_id, actor_type, timestamp). A correlation_id ties together every event from one operation: one remember() call emits a document event, several chunk events, and many entity/relationship events, all sharing it.

Querying the log

Events are namespace-scoped, like everything else. Query through the storage layer:

# Recent events in a namespace
recent = await kb.storage.get_events(ns_id, limit=50)

# Full history of one resource — "who changed this entity, and when?"
history = await kb.storage.get_events(ns_id, resource_type="entity", resource_id=entity_id)

# All entity merges in the last week
merges = await kb.storage.get_events(
    ns_id,
    event_types=["entity.merged"],
    after=datetime.now(UTC) - timedelta(days=7),
)

get_events(namespace_id, *, event_types=None, resource_type=None, resource_id=None, after=None, before=None, limit=100, offset=0)

is the full filter surface. To trace a single operation, filter by time/resource and group the results by correlation_id client-side. Write explicitly with kb.storage.append_event(...) / append_events_batch(...) if you’re recording your own events (Khora records its own automatically).

What it’s for

Audit trails: replay a resource’s history to answer “who changed this, when, and from what?” (previous_data holds the prior state).
Time travel: query before=<date>, then derive what existed then from the created-minus-deleted set.
Change data capture: poll after=<last_sync> and stream new events to a warehouse or downstream system.
Disaster recovery: replay events in timestamp order to rebuild a namespace.
Analytics: aggregate query.executed or lifecycle events for usage patterns.

Practical notes

The events table is indexed by (namespace_id, event_type), (resource_type, resource_id), timestamp, and correlation_id, so the queries above stay fast. A few habits keep the log useful:

Always set correlation_id for related operations: it’s what makes “what happened as a result of X?” answerable.
Batch event writes during ingestion (append_events_batch): one transaction for many events beats many transactions.
Keep data payloads small (ids and changed fields, not whole objects) and always paginate, because event logs grow large.

schema

Data model

The MemoryEvent fields and the content models events reference.

notifications

Semantic hooks

React to these events in real time instead of polling the log.

​What gets recorded

​Querying the log

​What it’s for

​Practical notes