khora.integrations.hermes plugs Khora into Hermes
as a long-term memory plane. Hermes owns the agent loop, model call, tool router, and
context-compression policy. Khora owns storage: vector recall, the entity graph,
temporal retrieval, and abstention signals. The integration is a single primitive:
KhoraMemoryProvider.
Install
There is nokhora[hermes] extra. hermes-agent pins requests==2.33.0, which
conflicts with Khora’s requests>=2.33.1 CVE floor, so the extra was removed. Install
hermes-agent yourself:
khora.integrations entry-point group, so
khora.integrations.discover() resolves it whenever hermes-agent is importable. If
your project enforces the CVE floor, vendor or fork hermes-agent to relax its pin.
Wiring it in
examples/integrations/hermes/plugin/ into
$HERMES_HOME/plugins/khora/. Its register(ctx) defaults to
KhoraMemoryProvider(kb=Khora.shared()) (override via the KHORA_HERMES_KB_FACTORY
env var).
Namespace mapping
Each(agent_identity, session_id) pair maps to a deterministic Khora namespace
(UUID5), so two providers for the same agent + session share memory across processes
without a shared registry. agent_identity is the tenancy key. Different agents stay
isolated even on the same session_id. The same session_id is stamped on every
stored document, so kb.forget_session(namespace, session_id) cleanly drops a whole
conversation.
Tools
Hermes registers two LLM-callable tools viaprovider.get_tool_schemas():
| Tool | For | Returns |
|---|---|---|
memory_search | ”What did Alice say about Phoenix?”, semantic recall | A <memory-context> block of top-K chunks + entity hits |
memory_recall | ”What did we discuss last week?”, adds before / after ISO-8601 bounds | Same, filtered by the window |
query (required), top_k (default 10, cap 50), and min_similarity
(default 0.1). An empty result returns "No prior memories found.", an explicit
abstention so the model doesn’t confabulate.
Threading model
This is the only adapter that bridges a sync caller (Hermes drives the provider from one thread per session) onto Khora’s async write path:- One
ThreadPoolExecutor(max_workers=1)per provider. Strict FIFO, so ingestion order matches turn order. Async work routes through the sharedrun_syncbridge. - A TTL-bounded prefetch cache keyed on
(namespace, session, query-hash)absorbs the “prefetch every turn” pattern. Concurrent readers wait on the same in-flight future instead of firing duplicate recalls. - Shed-oldest backpressure: at
queue_max_sizethe oldest pending write is cancelled (counterkhora.hermes.queue.shed_total).
Chat memory is best-effort. A hard crash mid-drain loses whatever is still queued;
a clean SIGTERM drains up to
drain_timeout_s. prefetch() may return the abstention
payload when writes haven’t drained (better an empty context than a stalled turn). Use
the tool-call path for guaranteed retrieval. Don’t os.fork() after constructing a
provider (fork safety is a follow-up).Key knobs
KhoraMemoryProvider constructor kwargs: kb (required), prefetch_timeout_s (0.8),
prefetch_cache_ttl_s (30.0), queue_max_size (256), drain_timeout_s (5.0),
failure_threshold_pct (1.0). Telemetry spans/counters are emitted under
khora.integrations.hermes.* and khora.hermes.* (no namespace_id labels, free
text is hashed).
menu_book
API reference
The stable public surface: construction, remember/recall/forget, and result types.
extension
Integrations overview
The full adapter lineup and the shared registry.