khora.integrations.llamaindex wires khora behind three LlamaIndex
surfaces in one extra:
KhoraRetriever:BaseRetrieverfor anyQueryEngine/ agent that takes a retriever. Async-only, see “Sync is not implemented” below.KhoraMemoryBlock:BaseMemoryBlock[str]factory for long-term semantic memory insidellama_index.core.memory.Memory.KhoraChatStore: deprecated legacyBaseChatStoreforChatMemoryBufferusers. New code should useKhoraMemoryBlock.
Install
llama-index-core>=0.14,<0.15. The pin is intentionally
narrow because LlamaIndex has shipped breaking changes on minor bumps
before (BaseMemoryBlock reshape across 0.11 → 0.12 → 0.14,
BaseMemory.put → aput). Plan one maintenance PR per LlamaIndex
minor release. The nightly skew job in CI catches breaks against the
latest tagged minor.
The adapter is also registered under the khora.integrations
entry-point group (factory: KhoraRetriever), so
khora.integrations.discover() returns it without explicit
registration.
Quickstart
example.py
examples/integrations/llamaindex/example.py by
tools/check_examples_drift.py (CI gate).
KhoraRetriever
| arg | default | notes |
|---|---|---|
kb | - | A connected Khora instance. Adapter does NOT own the lifecycle. |
namespace_id | - | Required khora namespace UUID this retriever reads from. |
similarity_top_k | 10 | Max chunks (and optionally entities) returned per aretrieve call. |
include_entities | False | When True, entity hits are returned alongside chunk hits as additional NodeWithScores. Default off. |
recall_kwargs | None | Optional dict of extra kwargs forwarded to Khora.recall (e.g. {"mode": SearchMode.HYBRID, "min_similarity": 0.2}). |
NodeWithScore’s node.metadata carries:
| key | description |
|---|---|
khora_kind | "chunk" or "entity". |
chunk_id / entity_id | Source object’s khora UUID. |
document_id | Parent document UUID (chunk nodes only). |
namespace_id | Khora namespace this node came from. |
khora_should_abstain | Boolean: True when khora’s abstention signals say the recall is low-confidence. Surfaced on every returned node so a downstream postprocessor / response synthesizer can short-circuit answer generation. |
Sync is not implemented
KhoraRetriever._retrieve raises NotImplementedError. The reason is
specific to this adapter: khora’s recall is async-native and the
deadlock surface for bridging it through a thread inside a running event
loop dominates the failure modes for this kind of plumbing. The fix is
straightforward. Every LlamaIndex QueryEngine exposes
aquery(...) / aretrieve(...). Use those.
If you genuinely need a sync path (e.g. a notebook outside any event
loop), wrap the call yourself:
nest_asyncio workaround. That’s a
hidden reentrancy bomb under any real agent loop.
KhoraMemoryBlock
Long-term memory block for llama_index.core.memory.Memory. The factory
returns a BaseMemoryBlock[str] instance:
_aget(messages)picks the last user-role message, callsKhora.recall(query, namespace=…, limit=similarity_top_k), and returns the rendered context wrapped in<khora_memory>…</khora_memory>so the prompt template can spot it._aput(messages)callsKhora.remember(content, namespace=…)once per message (skipping empty ones). The returneddocument_idis stamped ontomessage.additional_kwargs["khora_event_id"]so callers can round-trip a delete handle.atruncate(content, tokens_to_truncate)returnsNone- khora is the persistent store, so dropping the in-flight payload loses nothing.
| arg | default | notes |
|---|---|---|
kb | - | A connected Khora instance. |
namespace_id | - | Required khora namespace UUID. |
name | "khora_memory" | LlamaIndex memory block name. |
description | None | Optional human-readable description. |
priority | 1 | LlamaIndex truncation priority (lower = kept longer). |
similarity_top_k | 5 | Recall limit per _aget call. |
session_id | None | Optional khora session UUID stamped on every remember. Enables Khora.forget_session(...) cleanup. |
skill_name | "general_entities" | khora extraction skill name. |
entity_types | None (→ []) | Extraction whitelist. Empty disables extraction entirely (cheap chat-history writes). |
relationship_types | None (→ []) | Same: empty disables. |
KhoraChatStore (deprecated)
Legacy BaseChatStore for ChatMemoryBuffer. Instantiation emits a
DeprecationWarning. Provided only for compatibility with existing
code. New agents should use KhoraMemoryBlock instead.
BaseChatStore abstract sync methods are implemented and
bridged through khora.integrations._sync.run_sync (which rejects
calls made from inside a running event loop, see “Sync is not
implemented” above for the same reasoning).
get_keys() and the per-key list-by-index operations scan documents in
the bound namespace and filter on metadata client-side
(llamaindex_chat_key, llamaindex_chat_index). This is fine for
bounded chat workloads (one key per conversation, dozens of messages
each). Multi-tenant deployments with many active conversations in one
namespace should partition by namespace_id instead.
Limits and future work
- Filter pushdown to SQL for
KhoraChatStore.get_messages: the current O(N_docs) scan is acceptable for bounded chat workloads but not for hot multi-tenant deployments. - No support for image / audio / tool-call blocks inside
ChatMessage. Only the rendered text is persisted (viaChatMessage.content). The originaladditional_kwargsround-trips so the consumer can reconstruct non-text payloads from its own side channel. KhoraRetrieverreturns chunks and (optionally) entities. It does not return relationships. LlamaIndex has no first-class relationship node type and forcing them intoTextNodewould pollute the response synthesizer. UseKhora.recall(...)directly if you need relationship data.