Skip to main content
Extraction reads your documents and pulls out entities and relationships. The two lists every remember() takes, entity_types and relationship_types, advise khora which entities and relationships to extract. This flat list can work for simple content, but says nothing about how to describe a type, when two mentions are the same entity, or what to infer. ExpertiseConfig is the next level up: a complete, reusable domain ontology that tells Khora not only which types to extract but how to describe them, how to recognise the same entity across sources, what new edges to infer, how confident to be, and what prompt to extract with.
# The minimum every remember() needs — the types as anonymous, inline lists:
await kb.remember(text, namespace=ns,
                  entity_types=["PERSON", "ORG"], relationship_types=["WORKS_AT"])
The sections below build a reusable ExpertiseConfig (call it ontology) and pass it via expertise= on the same call. The next section shows the whole define-then-use in one block.

What an ontology bundles

One ExpertiseConfig carries all of this, most of it optional, with sensible defaults:
PieceWhat it controls
entity_typesThe types to extract, each with an attribute schema, identifiers, and aliases
relationship_typesEdge types, with source/target constraints and direction
correlation_rulesCross-source entity unification (merge the same person from Slack and email)
inference_rulesNew edges derived from existing ones (a when → then graph-pattern DSL)
confidencePer-ontology thresholds that filter low-confidence output
expansionHow aggressively to unify entities and infer relationships
system_prompt / extraction_promptThe Jinja2 prompt templates the extractor renders
events / factsEvent / atomic-fact extraction toggles (default on)
name / version / extendsIdentity, versioning, and inheritance from other ontologies

A minimal ontology in Python

The three building blocks (ExpertiseConfig, EntityTypeConfig, RelationshipTypeConfig) are importable straight from khora:
from khora import ExpertiseConfig, EntityTypeConfig, RelationshipTypeConfig

ontology = ExpertiseConfig(
    name="product_engineering",
    description="People, teams, and services in a product org.",
    system_prompt="Extract the people, teams, and services discussed in engineering docs.",
    entity_types=[
        EntityTypeConfig(name="PERSON",  description="An engineer or stakeholder."),
        EntityTypeConfig(name="TEAM",    description="An engineering team."),
        EntityTypeConfig(name="SERVICE", description="A deployable service or component."),
    ],
    relationship_types=[
        RelationshipTypeConfig(name="MEMBER_OF", description="A person belongs to a team.",
                               source_types=["PERSON"], target_types=["TEAM"]),
        RelationshipTypeConfig(name="OWNS", description="A team owns a service.",
                               source_types=["TEAM"], target_types=["SERVICE"]),
    ],
)

await kb.remember(
    text,
    namespace=ns,
    expertise=ontology,
    entity_types=ontology.get_entity_type_names(),          # still required — see below
    relationship_types=ontology.get_relationship_type_names(),
)
entity_types and relationship_types are required on every remember() even when you pass expertise=. The expertise object doesn’t replace them. Pass ontology.get_entity_type_names() / get_relationship_type_names() so the lists stay in sync with the ontology. (The field is name=, not type=.)

Entity types: attributes, identifiers, aliases

An EntityTypeConfig is more than a label. Three fields shape extraction and matching:
  • attributes: {required: [...], optional: [...]}, a soft schema that nudges the LLM to pull the fields you care about.
  • identifiers: the attributes that identify the same entity across documents (e.g. email for a person, repo_url for a service). These drive deduplication.
  • aliases: alternative type labels, so a model that emits COMPANY still lands on your ORGANIZATION type.
- name: PERSON
  description: "An engineer or stakeholder."
  attributes:
    required: [name]
    optional: [email, role]
  identifiers: [email, name]      # two people sharing an email are one entity
  aliases: [ENGINEER, EMPLOYEE]

Relationship types

source_types / target_types constrain which entity types an edge may connect (["*"] means any). bidirectional: true marks symmetric edges like COLLABORATES_WITH.
- name: MEMBER_OF
  source_types: [PERSON]
  target_types: [TEAM]
- name: COLLABORATES_WITH
  source_types: [PERSON]
  target_types: [PERSON]
  bidirectional: true
A relationship type’s source_types / target_types must reference entity type names that exist in the same ontology (or *). An edge that points at an undeclared type is dropped.

Cross-source entity unification (correlation rules)

Correlation rules merge the same real-world entity seen through different sources, the canonical “the Slack @ada and the Gmail ada@acme.com are one person” problem. A rule matches on match_fields (or a regex pattern), scoped to entity_types:
correlation_rules:
  - name: dedupe_people_by_email
    description: "People sharing an email are the same entity."
    match_fields: [email]
    entity_types: [PERSON]
    confidence: 0.9
Tie a rule to stable identifiers (email, url, id) at high confidence (0.85–0.95). Match on names only at lower confidence (0.7–0.8), since names collide.

Inferring new edges (inference rules)

Inference rules derive relationships that were never written down, from ones that were. Each rule is a when → then: a list of conditions to match, and the edge to create. The matcher walks chains of relationships and supports three linking shapes:
ShapePatternExample
Transitiveprev.target → next.sourceA MEMBER_OF Team, Team OWNS ServiceA CONTRIBUTES_TO Service
Shared-targetprev.target == next.targetA MEMBER_OF Team, B MEMBER_OF TeamA COLLABORATES_WITH B
Shared-sourceprev.source == next.sourceA OWNS X, A OWNS Y ⇒ relate X and Y
then.source / then.target pick which matched entities form the new edge, by ordinal (first.source, second.target, …):
inference_rules:
  - name: teammates_collaborate          # shared-target
    when:
      - {relationship: MEMBER_OF, source_type: PERSON, target_type: TEAM}
      - {relationship: MEMBER_OF, source_type: PERSON, target_type: TEAM}
    then: {relationship: COLLABORATES_WITH, source: first.source, target: second.source}
    confidence: 0.5
  - name: contribute_to_owned_services   # transitive
    when:
      - {relationship: MEMBER_OF, source_type: PERSON, target_type: TEAM}
      - {relationship: OWNS,      source_type: TEAM,   target_type: SERVICE}
    then: {relationship: CONTRIBUTES_TO, source: first.source, target: second.target}
    confidence: 0.6
Inference only runs when an ontology with inference_rules is loaded and expansion.relationship_inference is on with a non-none inference_mode. A plain remember() with no expertise does no inference. The inferrer logs “No expertise or inference rules configured, skipping inference” and creates nothing. Keep 2–4 rules per ontology. Broad rules can explode the edge count.

Confidence thresholds

The ontology carries its own thresholds, and the extractor drops anything below them, a per-ontology precision knob:
confidence:
  min_entity: 0.5         # extracted entities below this are discarded
  min_relationship: 0.5   # extracted relationships below this are discarded
  min_inferred: 0.4       # inferred relationships below this are discarded

Expansion behavior

expansion controls the optional expansion phase of ingestion:
expansion:
  enabled: true
  inference_mode: smart        # smart | incremental | batch | none
  relationship_inference: true
  cross_tool_unification: true
  depth: 2                     # transitive inference passes
  preload_existing: true       # smart mode: load existing entities for cross-doc dedup

Authoring in YAML and loading it

For anything beyond a couple of types, YAML is the natural home. It keeps the whole ontology in one reviewable file. Load it with ExpertiseLoader:
from khora.extraction.skills import ExpertiseLoader

ontology = ExpertiseLoader().load_file("ontologies/product_engineering.yaml")
# or a bundled starting point:
ontology = ExpertiseLoader().load_builtin("general")

await kb.remember(text, namespace=ns, expertise=ontology,
                  entity_types=ontology.get_entity_type_names(),
                  relationship_types=ontology.get_relationship_type_names())
A complete ontology pulling the pieces together:
name: product_engineering
version: "1.0.0"
description: "People, teams, and services in a product org."

system_prompt: |
  You extract the people, teams, and services discussed in internal engineering
  documents. Capture a person's email or a service's repository URL as an identifier
  when present, so the same entity mentioned in different documents is recognised as one.

entity_types:
  - name: PERSON
    description: "An engineer or stakeholder."
    attributes: {required: [name], optional: [email, role]}
    identifiers: [email, name]
    aliases: [ENGINEER, EMPLOYEE]
  - name: TEAM
    description: "An engineering team or squad."
    attributes: {required: [name], optional: [mission]}
    identifiers: [name]
  - name: SERVICE
    description: "A deployable service or component."
    attributes: {required: [name], optional: [repo_url, language]}
    identifiers: [repo_url, name]
    aliases: [COMPONENT, MICROSERVICE]

relationship_types:
  - {name: MEMBER_OF, source_types: [PERSON], target_types: [TEAM]}
  - {name: OWNS,      source_types: [TEAM],   target_types: [SERVICE]}
  - {name: CONTRIBUTES_TO, source_types: [PERSON], target_types: [SERVICE]}
  - {name: COLLABORATES_WITH, source_types: [PERSON], target_types: [PERSON], bidirectional: true}

correlation_rules:
  - name: dedupe_people_by_email
    match_fields: [email]
    entity_types: [PERSON]
    confidence: 0.9

inference_rules:
  - name: teammates_collaborate
    when:
      - {relationship: MEMBER_OF, source_type: PERSON, target_type: TEAM}
      - {relationship: MEMBER_OF, source_type: PERSON, target_type: TEAM}
    then: {relationship: COLLABORATES_WITH, source: first.source, target: second.source}
    confidence: 0.5
  - name: contribute_to_owned_services
    when:
      - {relationship: MEMBER_OF, source_type: PERSON, target_type: TEAM}
      - {relationship: OWNS,      source_type: TEAM,   target_type: SERVICE}
    then: {relationship: CONTRIBUTES_TO, source: first.source, target: second.target}
    confidence: 0.6

confidence: {min_entity: 0.5, min_relationship: 0.5, min_inferred: 0.4}
expansion:  {enabled: true, inference_mode: smart, relationship_inference: true, depth: 2}
Ingesting one document (“Ada (ada@acme.com) and Bob are engineers on the Payments team. The Payments team owns the billing-api service.”) with this ontology extracts PERSON/TEAM/SERVICE entities and the MEMBER_OF / OWNS edges, then infers Ada COLLABORATES_WITH Bob and Ada/Bob CONTRIBUTES_TO billing-api.

Composing ontologies with extends

Ontologies are versioned and inherit. A config can extends one or more parents. The loader resolves the chain and merges: entity/relationship types add-or-override by name, rules combine (later wins on a name clash), prompts/confidence/expansion are overridden by the child:
name: hiring
extends: [ontologies/base_people.yaml]   # inherits PERSON, ORGANIZATION, …
system_prompt: |
  {{ parent_prompt }}
  Additionally, extract job candidates and roles.
  Known types: {% for t in entity_types %}{{ t.name }} {% endfor %}
entity_types:
  - name: CANDIDATE
    description: "A job applicant."
System prompts are Jinja2 templates rendered against the ontology, so a child can wrap its parent’s prompt with {{ parent_prompt }} and iterate the merged {{ entity_types }}. Loading hiring above yields the types PERSON, ORGANIZATION, CANDIDATE and a prompt that embeds the parent’s text plus the live type list.

The built-in type hierarchy

Inference and relationship rules match through a built-in subtype map, so a rule written for a general type also fires on specific ones: EMPLOYEE and EXTERNAL_PERSON satisfy a rule expecting PERSON. COMPANY / DEPARTMENT / TEAM satisfy ORGANIZATION, and CALL satisfies EVENT. You can write rules against broad types and still match a richly-typed graph.

Engine notes

VectorCypher supports ontology-driven typed extraction via the entity_types / relationship_types kwargs, and runs expansion, so it honours correlation_rules, inference_rules, and the expansion block.
conveyor_belt

Ingestion

Where the ontology plugs in: the three-phase write path.
code

Workloads example

A runnable ExpertiseConfig in the resume-search walkthrough.
menu_book

API reference

ExpertiseConfig and the ontology dataclasses.
search

Retrieval

The read path that queries what extraction produced.