Rust Acceleration

The CPU-bound parts of Khora (pairwise cosine similarity, Levenshtein distance, PageRank, BM25 scoring, keyword extraction, entity resolution) dominate at scale, and Python’s GIL prevents true parallelism for them. The optional khora-accel Rust extension (built on PyO3) handles these natively, with zero-copy NumPy access, GIL release during compute, and Rayon work-stealing parallelism across cores. It’s entirely optional: if Rust isn’t installed, Khora falls back transparently.

The 3-tier fallback

Every accelerated operation has three implementations. The fastest available is chosen automatically at import:

Tier	Backend	Notes
0	Rust (`khora-accel`)	Fastest: rayon parallelism, GIL release, zero-copy NumPy
1	NumPy / RapidFuzz	Good, widely available: vectorized + C-backed
2	Pure Python	Always works: stdlib only (`math`, `difflib`, `re`)

Force a tier with the KHORA_ACCEL_BACKEND env var: rust, numpy, or python (unset = auto-detect). python is handy for debugging.

Installing

pip install "khora[rust]"     # or: pip install khora-accel

The khora-accel wheel is pinned in lockstep with Khora’s own version. Install the matching pair. Verify it loaded:

import khora_accel
khora_accel.levenshtein_similarity("hello", "hallo")   # 0.8

What’s accelerated

khora-accel ships ~40 functions across vector math, string similarity, ranking, and graph ops. Indicative speedups over pure Python:

Category	Operations	Speedup
String similarity	Levenshtein, Jaro-Winkler, batch variants	10–40×
Entity resolution	3-stage batch matching (exact → alias → fuzzy)	10–30×
PageRank / PPR	Skeleton core selection, graph scoring	5–15×
Vector math	Cosine, batch/pairwise cosine, dot product, MMR	5–10×
BM25	Index build + search	3–8×
Keyword extraction	Tokenize + stopword filter	3–5×
RRF fusion	Reciprocal rank fusion + normalization	2–5×

These power entity dedup, skeleton indexing, MMR diversity selection, and RRF fusion, so the benefit shows up across both ingestion and query.

Thread pool tuning

Rayon uses one global thread pool per process. Size it for your workload before any parallel work runs:

import khora_accel

khora_accel.configure_thread_pool(mode="query")    # fewer threads, lower latency
khora_accel.configure_thread_pool(mode="ingest")   # more threads, higher throughput
khora_accel.configure_thread_pool(num_threads=8)    # explicit

When it matters

Large-scale ingestion (>1,000 docs): entity resolution and pairwise cosine dominate, and Rayon scales near-linearly across cores.
Skeleton indexing: PageRank and keyword extraction run every ingest batch.
High-volume query: BM25 and RRF fusion benefit at >10k indexed documents.
Small workloads (under ~100 docs): the NumPy/Python tiers are plenty, and Rust is unnecessary (but harmless).

Integrations

Wire Khora into CrewAI, LangGraph, Google ADK, OpenAI Agents, or LlamaIndex.

Performance & scaling

Where Rust acceleration fits among the other scaling levers.

Getting started

Concepts

Operations

Experimental Features

Integrations

Reference

Examples

Rust Acceleration

The 3-tier fallback

Installing

What’s accelerated

Thread pool tuning

When it matters

Integrations

Performance & scaling

​The 3-tier fallback

​Installing

​What’s accelerated

​Thread pool tuning

​When it matters

Integrations

Performance & scaling

The 3-tier fallback

Installing

What’s accelerated

Thread pool tuning

When it matters