recall() is not a single similarity lookup. Khora understands the query, searches
multiple backends in parallel, fuses the rankings, and optionally reranks, then
returns a typed RecallResult.
Search modes
Themode kwarg picks which channels run:
| Mode | Channels | Best for |
|---|---|---|
VECTOR | Semantic similarity only | ”What’s similar to X?” |
GRAPH | Entity-relationship traversal only | ”Who works with X?” |
KEYWORD | BM25 / full-text only | Exact terms, names, acronyms |
HYBRID (default) | Vector + graph + keyword, fused via RRF | Balanced, the usual choice |
ALL | Every channel | Effectively the same as HYBRID today |
How a query flows
- Understand: one LLM call classifies intent, extracts entity mentions and temporal references (resolved to ISO-8601), proposes per-query fusion weights, and scores complexity. This shapes everything downstream.
- Link entities: query mentions are matched to stored entities by exact, fuzzy (edit-distance), and embedding similarity. Matches seed graph traversal.
- Search: vector (pgvector), graph (Neo4j), and keyword (BM25) channels run in parallel.
- Fuse with RRF: Reciprocal Rank Fusion combines the rankings by rank, not
score (scores aren’t comparable across channels):
score = Σ weight / (k + rank), withk=60and default weights vector 0.5 / graph 0.3 / keyword 0.2. Chunks surfaced by multiple channels rise to the top. - Filter: apply temporal windows and (optionally) MMR diversity.
- Rerank: an optional cross-encoder reorders the top candidates (skipped under 5 results, where RRF order is already fine).
- Limit: return the top-k with full provenance.
Recall filters
These are the public knobs onrecall(). Everything else is global config (KhoraConfig.query):
limit: cap the response at the engine level (cheaper than over-fetching).min_similarity: raw cosine cutoff on the semantic channel, applied before normalization (a real quality gate, unlike thresholdingchunk.score).mode: the channel selection above.start_time/end_time: explicit temporal window. Bypasses NLP date detection and is honored on all three engines (both bounds naive or both aware).
Threshold philosophy: cast a wide net, rank carefully.
recall()’s min_similarity
default is 0.0 on purpose. Khora’s strength is the ranking pipeline (RRF + entity
boosting + reranking), which works better with more candidates. A 0.35-similarity
chunk that’s the right answer beats zero results. Raise min_similarity only when you
want strict, high-confidence-only matches. (An earlier 0.5 default caused ~25% of
queries to return nothing. Lowering it was the fix.)Diversity, reranking, and HyDE
These are global toggles onKhoraConfig.query (env KHORA_QUERY_*), not per-call:
- MMR diversity (
enable_diversity, default on): Maximal Marginal Relevance removes near-duplicate chunks after fusion (diversity_lambda=0.7balances relevance vs. diversity, Rust-accelerated). - Cross-encoder reranking (
enable_reranking): neural reorder of the top candidates for precision. - LLM reranking (
enable_llm_reranking): an LLM pass on temporal queries. - HyDE (
enable_hyde:auto/always/never) generates a hypothetical answer doc and searches its embedding. Inautoit fires on complex/temporal queries. An opt-in HyDE-Cypher channel (KHORA_QUERY_ENABLE_HYDE_CYPHER) runs parameterized graph templates for structured “latest X” queries.
enable_llm_reranking=False and
enable_hyde="never".
The engine also adapts how many chunks it retrieves to query complexity, from
very_focused (≤3 chunks for simple lookups) to broad (15 for multi-hop questions).
Reading the result
recall() returns a RecallResult with chunks,
entities, relationships, a deduplicated documents list (every chunk/entity/relationship
document_id is guaranteed to appear there), and engine_info.
Abstention
Sometimes a search turns up nothing solid. The corpus simply doesn’t cover the question. Rather than let your app answer confidently from weak matches, Khora pre-computes a set of “should we decline to answer?” flags. Abstaining just means choosing to say “I don’t know” instead of guessing. They live inresult.engine_info["abstention_signals"]:
chunks_empty: no matching text was found.entities_empty: no matching entities were found.chunks_below_min: fewer chunks matched than the minimum worth answering from.top_score_low: even the best match scored low.combined_score: a single 0–1 blend of the signals above.should_abstain: the overall verdict. The results are too thin to answer from.
top_score_low is computed from the raw, pre-rerank similarity
(max_raw_vector_score), not the final reranked score. Reranking squeezes every
result into a narrow high band (even off-topic ones), so the raw score is the honest
signal of whether anything actually matched. (A graph-only recall, where nothing
matched by text, therefore reads top_score_low = true.)khora.context_text(result, max_chunks=...).
For multi-step exploration, use khora.query.agentic.AgenticSearchAgent directly.
Agentic search isn’t exposed on recall().
input
Ingestion
The write path that builds what retrieval searches over.
tune
Engine tuning
Per-engine retrieval knobs and how to tune fusion, decay, and reranking.