Skip to main content
The examples/10_core_apis/ tier picks up where the Basics leave off. Each example isolates one API surface and shows the knobs that matter. So when you reach for remember_batch, recall(...) filters, or the entity-graph reads, you already know which argument does what.
Most examples here run on the zero-infra embedded backend (pip install "khora[sqlite-lance]" + OPENAI_API_KEY), exactly like the basics. Reading entities & relationships back and Exploring the graph default to PostgreSQL + Neo4j. Bring the stack up with make dev from the Khora repo first. They read the entity graph back, and on the embedded backend entity vectors aren’t written to LanceDB yet, so two of their steps would come back empty. Both still accept --config examples/khora.embedded.yaml if you want to run degraded.

Batch ingestion with remember_batch

A kb.remember() loop is fine for a handful of records but pays three avoidable taxes at scale: the embedder cache re-warms per call, entity dedup runs per-document with no cross-doc scope, and concurrency is whatever your caller wired up. remember_batch fixes all three: one shared embedder cache, cross-document dedup via EntityIndex, and a max_concurrent ceiling on in-flight LLM calls (≈5–10 on a laptop, higher on the standard stack).
result = await kb.remember_batch(
    docs,                       # list of {"content", "title", "source"} dicts
    namespace=ns_id,
    max_concurrent=5,           # docs in flight against the LLM at once
    on_progress=on_progress,
    entity_types=["CONCEPT", "EVENT"],
    relationship_types=["RELATES_TO"],
)
print(f"{result.processed} processed, {result.skipped} skipped, {result.failed} failed")
print(f"chunks={result.chunks}, entities={result.entities}, relationships={result.relationships}")
Takeaway: the returned BatchResult carries processed / skipped / failed counts plus rolled-up chunks / entities / relationships totals.
As of v0.17 the on_progress callback isn’t useful for a live progress bar: VectorCypher fires per-document, but the calls burst at the end of the underlying asyncio.gather() rather than streaming. Print the doc count and elapsed time after the batch returns instead.

Recall with filters

recall() exposes three arguments that change the result set itself, not just its ordering. Knowing them removes most application-side post-filtering:
1

limit: cap the response at the engine level

Cheaper than asking for 100 chunks and trimming to 5 in Python.
result = await kb.recall("ingest pipeline work", namespace=ns_id, limit=2)
2

min_similarity: a real semantic-quality cutoff

Drops chunks whose raw cosine is below the threshold. It operates before score normalization, so it’s a genuine quality gate, unlike thresholding the post-normalize chunk.score.
result = await kb.recall(
    "memory leak in ingestion service",
    namespace=ns_id,
    limit=10,
    min_similarity=0.4,
)
3

mode: choose the retrieval channels

SearchMode.{VECTOR, GRAPH, HYBRID, ALL, KEYWORD}. HYBRID (the default) fuses vector + graph + BM25. KEYWORD is BM25-only lexical match.
from khora.query.engine import SearchMode

result = await kb.recall(query, namespace=ns_id, limit=3, mode=SearchMode.KEYWORD)
Why this matters: VectorCypher populates all three channels, so mode actually differentiates between vector, graph, and keyword retrieval.
recall() also accepts start_time / end_time, but as of v0.17 the source_timestamp= you pass at ingest doesn’t reliably propagate to a chunk’s occurred_at, so time-windowed recall via these kwargs is unreliable today.

Ontology config

entity_types and relationship_types are required kwargs on every remember(), but they’re guidance, not a hard schema. The extractor treats your list as a strong hint (it can still emit types outside it), and passing empty lists doesn’t turn extraction off. It removes the guidance entirely, so Khora falls back to unbounded extraction and infers its own taxonomy from the content, with loosely-cased model-chosen labels (you’ll see Person and EVENT side by side). So pass them deliberately. This example runs the same paragraph through two ontologies and compares the resulting entity-type histograms:
# Generic — blunt, catch-all labels you must disambiguate downstream
entity_types=["PERSON", "ORGANIZATION", "CONCEPT", "LOCATION", "EVENT", "PRODUCT", "TECHNOLOGY"]
relationship_types=["RELATES_TO", "PART_OF", "MENTIONS"]

# Domain-specific — typed the way a recruiter actually thinks
entity_types=["CANDIDATE", "EMPLOYER", "ROLE", "SKILL", "HIRING_MANAGER", "INTERVIEW_LOOP"]
relationship_types=["APPLIED_TO", "WORKED_AT", "HAS_SKILL", "SCHEDULED_FOR", "MANAGED_BY"]

# Inspect what was extracted, either way:
entities = await kb.list_entities(namespace=ns_id, limit=50)
histogram = Counter(e.entity_type for e in entities)
Takeaway: same model, same text. But the domain-specific run yields an entity-type histogram you can query against. Picking the ontology is the single highest-leverage extraction decision you make. This is the groundwork for the iterative-ontology workflow planned in Tier 40 (open extraction → inspect → refine → re-extract). When you need the system prompt, dedup, and inference rules versioned alongside the types, package them as an ExpertiseConfig. See 06 below and the Expertise & ontologies guide.

Reading entities & relationships back

VectorCypher builds a graph at write time (one LLM call per remember produces entities + relationships keyed back to the source chunks). This example reads that graph back three ways:
recall = await kb.recall("Marie Curie discoveries and family", namespace=ns_id, limit=5)

recall.chunks          # [1] textual evidence
recall.entities        # [2] extracted nodes, inline on the projection
recall.relationships   # [3] edges between them, scored

# The dedicated reads — these work on BOTH backends:
all_entities = await kb.list_entities(namespace=ns_id, limit=20)
neighbours = await kb.find_related_entities(marie.id, namespace=ns_id, max_depth=1, limit=10)
Takeaway: recall().entities / .relationships give you the graph inline with retrieval. list_entities and find_related_entities are the explicit “show me the graph” reads. On embedded sqlite_lance the inline lists come back empty. The example detects this and falls through to the list/traverse APIs, which is why it defaults to PostgreSQL + Neo4j.

Exploring the graph

The widest tour in the tier, seven reads against the same Curie-family graph, so you can pick the right call for the task:
  1. list_entities(namespace): enumerate everything
  2. list_entities(entity_type="PERSON"): filter to one type
  3. search_entities(query, namespace): semantic lookup by name/description
  4. get_entity(entity_id, namespace): fetch one node by id
  5. find_related_entities(entity_id, max_depth=…): walk the edges outward
  6. recall(query, mode=SearchMode.GRAPH): graph-channel retrieval
  7. An ASCII tree built from (1) + (5)
persons = await kb.list_entities(namespace=ns_id, entity_type="PERSON", limit=50)
hits    = await kb.search_entities("physicist who discovered radium", namespace=ns_id, limit=3)
node    = await kb.get_entity(marie.id, namespace=ns_id)
depth2  = await kb.find_related_entities(marie.id, namespace=ns_id, max_depth=2, limit=40)
Takeaway: list_entities / get_entity / find_related_entities read from the graph store and work on every backend. search_entities and recall(mode=GRAPH) need entity vectors, so they degrade to empty on embedded, hence the PostgreSQL + Neo4j default. max_depth is how you control how far a traversal fans out.

Ontology as an ExpertiseConfig

Example 03 used bare entity_types / relationship_types lists. When you outgrow that, an ExpertiseConfig packages the whole recruiting domain into one reusable, versioned object: a system prompt, typed entities with identifiers (for cross-source dedup), typed relationships, a correlation rule (merge the same candidate seen in two notes), and an inference rule that derives edges the text never stated:
from khora import ExpertiseConfig, EntityTypeConfig, RelationshipTypeConfig
from khora.extraction.skills import InferenceRule, InferenceCondition

ontology = ExpertiseConfig(
    name="recruiting",
    system_prompt="People who applied are CANDIDATE; companies are EMPLOYER; roles are ROLE; …",
    entity_types=[EntityTypeConfig(name="CANDIDATE", identifiers=["email", "name"]), ...],
    relationship_types=[
        RelationshipTypeConfig(name="APPLIED_TO", source_types=["CANDIDATE"], target_types=["ROLE"]),
        RelationshipTypeConfig(name="ROLE_AT",    source_types=["ROLE"],      target_types=["EMPLOYER"]), ...],
    inference_rules=[InferenceRule(
        name="candidate_targets_employer",
        when=[InferenceCondition(relationship="APPLIED_TO", source_type="CANDIDATE", target_type="ROLE"),
              InferenceCondition(relationship="ROLE_AT",    source_type="ROLE",      target_type="EMPLOYER")],
        then_relationship="TARGETS", then_source="first.source", then_target="second.target")],
)

await kb.remember(text, namespace=ns_id, expertise=ontology,
                  entity_types=ontology.get_entity_type_names(),        # the kwargs are still required
                  relationship_types=ontology.get_relationship_type_names())
A short hiring note, driven by the ontology object. Entities come back typed the way the ontology declares (one run, LLM output varies). The example also turns event extraction off (VectorCypherConfig(store_events=False)), so no EVENT / PARTICIPATED_IN appear. The ASSOCIATED_WITH edges are co-occurrence links Khora adds and aren’t configurable on VectorCypher. See Ingestion for both knobs:
entity-type histogram (driven by the ExpertiseConfig):
  CANDIDATE        1    Priya Patel
  EMPLOYER         1    Acme Robotics
  ROLE             1    backend engineering
  SKILL            1    Python
  HIRING_MANAGER   1    Sam Chen

relationships: APPLIED_TO · ROLE_AT · HAS_SKILL · MANAGED_BY  (+ ASSOCIATED_WITH co-occurrence)
Takeaway: bare lists are a per-call hint. An ExpertiseConfig is the ontology as a first-class artifact: define the prompt, identifiers (dedup), correlation rules, and inference rules once, version them, and reuse them across every remember(). The APPLIED_TO + ROLE_AT ⇒ TARGETS inference rule is the kind expansion uses to derive candidate→employer edges no chunk states. Full guide: Expertise & ontologies.

Next steps

workspaces

Workloads

End-to-end scenarios that compose these APIs into real applications.
settings_input_component

VectorCypher

The engine behind these examples: hybrid vector + graph + keyword retrieval.