recall() ranks by relevance. A recall filter runs next to that ranking as a hard
gate: a deterministic predicate that a chunk either satisfies or doesn’t. Reach for it
when “relevant” isn’t enough and you need “relevant and from this source”, “and tagged
tier: gold”, or “and dated this quarter”.
Pass it as the filter= kwarg. It takes a RecallFilter or a plain dict (the dict is
validated for you):
$-prefixed operators ($eq, $in, $and, …) but is closed and typed. An unknown key, a malformed operator,
or an out-of-place predicate raises RecallFilterValidationError (with a structured
errors list) instead of quietly matching nothing.
What you can filter on
Two groups of keys, plus your own metadata. You set all of these at ingest withremember():
see Ingestion.
System keys are first-class fields Khora denormalizes onto every chunk:
| Key | Type | Operators |
|---|---|---|
occurred_at, created_at, source_timestamp | datetime | $eq $ne $gt $gte $lt $lte $in $nin $not |
source_type, source_name, source_url, external_id, content_type, source, title | string | $eq $ne $in $nin $exists $not |
$exists (the axis is always present) and string keys have no range
operators. Both restrictions are enforced when the filter is built.
Metadata is your free-form per-document dict. Match the whole blob for exact equality
({"metadata": {...}}), or address one field with dot-notation:
$eq $ne $gt $gte $lt $lte $in $nin $exists $not.
You cannot mix operator keys with plain keys in one object, and an operator nested inside an
equality value is rejected. Reach a nested field with a dotted key, not a nested dict.
Operators
$eq $ne $gt $gte $lt $lte $in $nin $exists, the logical operators $and
$or $nor $not, and the $date typed literal (below). Logical operators combine whole
sub-filters and nest freely:
Bare-value shorthand
- A scalar is
$eq:{"source_name": "linear"}means{"source_name": {"$eq": "linear"}}. - A list is exact-array equality, not membership:
{"source_type": ["a", "b"]}matches the value["a", "b"]. For “is one of”, use$in:{"source_type": {"$in": ["a", "b"]}}. nullis an active null-or-missing match. To not filter on a key, omit it.
$date in dict form
JSON has no datetime type. In the dict form, wrap a date operand as {"$date": "<ISO-8601>"}
so it compares as a date rather than a string. System date keys take datetime objects
directly, so $date matters mainly for date-valued metadata:
Filtering by time
filter={"occurred_at": {...}} is the supported way to bound recall by time. It enforces
the event-time axis exactly, with no fallback to ingest time.
The older start_time / end_time kwargs are deprecated. They are a recency window
over COALESCE(source_timestamp, created_at), a different axis, and they cannot be combined
with filter= (doing so raises ValueError). Prefer:
What gets pushed down
Khora compiles each filter to a native backend query where it can, and re-checks the rest in memory. The result is identical either way. The difference is cost: a pushed-down predicate narrows the index scan instead of filtering after the fact. On the VectorCypher engine:- Postgres + Neo4j: the vector and keyword (BM25) channels push the whole filter into the
SQL and index scan. The graph channel pushes system keys into the Cypher
WHEREand re-checks any residual metadata predicate in memory. - Embedded (
sqlite_lance): the vector and keyword channels compile to a SQLiteWHEREwith JSON metadata pushdown. The embedded graph channel re-checks in memory.
The pushdown report
Every filtered recall reports honestly what happened, onengine_info["filter"], a FilterPushdownReport:
pushed_keys, post_filtered_keys, and unenforced_keys partition every constraint leaf
into exactly one bucket. unenforced_keys is the one to watch: a non-empty list means a
result-producing path returned candidates without enforcing those keys. On a correct recall
it is always empty.
Errors
RecallFilterValidationError: the filter is malformed (unknown key, bad operator, wrong operand shape, nesting too deep). Raised at therecall()call before any search, with a structurederrorslist an SDK can map to an HTTP 400.RecallFilterUnsupportedError: a backend cannot honor a predicate on a key it doesn’t back with a real column. It fails loud rather than silently dropping every row.
khora package.