Skip to main content
The CPU-bound parts of Khora (pairwise cosine similarity, Levenshtein distance, PageRank, BM25 scoring, keyword extraction, entity resolution) dominate at scale, and Python’s GIL prevents true parallelism for them. The optional khora-accel Rust extension (built on PyO3) handles these natively, with zero-copy NumPy access, GIL release during compute, and Rayon work-stealing parallelism across cores. It’s entirely optional: if Rust isn’t installed, Khora falls back transparently.

The 3-tier fallback

Every accelerated operation has three implementations. The fastest available is chosen automatically at import:
TierBackendNotes
0Rust (khora-accel)Fastest: rayon parallelism, GIL release, zero-copy NumPy
1NumPy / RapidFuzzGood, widely available: vectorized + C-backed
2Pure PythonAlways works: stdlib only (math, difflib, re)
Force a tier with the KHORA_ACCEL_BACKEND env var: rust, numpy, or python (unset = auto-detect). python is handy for debugging.

Installing

pip install "khora[rust]"     # or: pip install khora-accel
The khora-accel wheel is pinned in lockstep with Khora’s own version. Install the matching pair. Verify it loaded:
import khora_accel
khora_accel.levenshtein_similarity("hello", "hallo")   # 0.8

What’s accelerated

khora-accel ships ~40 functions across vector math, string similarity, ranking, and graph ops. Indicative speedups over pure Python:
CategoryOperationsSpeedup
String similarityLevenshtein, Jaro-Winkler, batch variants10–40×
Entity resolution3-stage batch matching (exact → alias → fuzzy)10–30×
PageRank / PPRSkeleton core selection, graph scoring5–15×
Vector mathCosine, batch/pairwise cosine, dot product, MMR5–10×
BM25Index build + search3–8×
Keyword extractionTokenize + stopword filter3–5×
RRF fusionReciprocal rank fusion + normalization2–5×
These power entity dedup, skeleton indexing, MMR diversity selection, and RRF fusion, so the benefit shows up across both ingestion and query.

Thread pool tuning

Rayon uses one global thread pool per process. Size it for your workload before any parallel work runs:
import khora_accel

khora_accel.configure_thread_pool(mode="query")    # fewer threads, lower latency
khora_accel.configure_thread_pool(mode="ingest")   # more threads, higher throughput
khora_accel.configure_thread_pool(num_threads=8)    # explicit

When it matters

  • Large-scale ingestion (>1,000 docs): entity resolution and pairwise cosine dominate, and Rayon scales near-linearly across cores.
  • Skeleton indexing: PageRank and keyword extraction run every ingest batch.
  • High-volume query: BM25 and RRF fusion benefit at >10k indexed documents.
  • Small workloads (under ~100 docs): the NumPy/Python tiers are plenty, and Rust is unnecessary (but harmless).
extension

Integrations

Wire Khora into CrewAI, LangGraph, Google ADK, OpenAI Agents, or LlamaIndex.
speed

Performance & scaling

Where Rust acceleration fits among the other scaling levers.