| Operation | Returns |
|---|---|
recall | Ranked chunks and entities — raw context for your own prompt. |
ask | A synthesized answer with cited sources, generated from your memories. |
recall when your application is responsible for the final prompt or rendering. Use ask when you want a finished answer.
Recall
- A pre-formatted
context_textready to paste into an LLM prompt - A list of ranked chunks with similarity scores
- Related entities and relationships from the knowledge graph
- The original
queryandnamespace_id
Search modes
mode controls which retrieval channels run.
| Mode | What runs | When to use |
|---|---|---|
vector | Vector similarity only | Fastest. Good for short, semantic queries. |
graph | Graph traversal only | When the query references named entities and you want connected information. |
hybrid (default) | Vector + graph + keyword, fused via Reciprocal Rank Fusion | Best general-purpose. Use this unless you have a reason not to. |
all | Every available channel | Slower, returns more context. Use when you want maximum coverage. |
Tuning recall
| Parameter | Effect |
|---|---|
limit | Maximum number of chunks to return. Default 10. |
mode | Which retrieval channels run. See above. |
Temporal queries
Pass explicit time bounds to scope retrieval by event time. This bypasses the natural-language temporal detection that would otherwise parse phrases like “last week” out of the query text.- TypeScript SDK
- curl
Date or an ISO-8601 string and serializes appropriately.| Bounds | Effect |
|---|---|
| Both set | Closed window — only memories with event time in [start_time, end_time]. |
start_time only | Half-open forward — everything from that timestamp onward. |
end_time only | Half-open backward — everything up to that timestamp. |
| Neither | No temporal filter; natural-language detection on the query text still applies. |
Passing an explicit time bound disables natural-language temporal detection on the query, so phrases like “yesterday” in the query string are ignored. Use one or the other, not both.
Ask
ask runs the same hybrid retrieval as recall, then passes the results to an LLM that produces a written answer with cited sources.
Tuning synthesis
| Field | Effect |
|---|---|
min_recall_limit | Minimum number of memories to recall before synthesizing. |
max_recall_limit | Maximum number of memories to recall. |
total_tokens_limit | Maximum total tokens to use for synthesis. |
enabled_tools | Which retrieval tools the synthesizer is allowed to use. Common values: memory_recall, entity_search, entity_explore, web_search. |
from / until work the same way as on recall — they cap the time window for any internal recall the synthesizer issues.
Recall vs ask: when to use which
Use recall when
You control the final prompt. You want the cheapest, fastest read. You’re combining Deyta Platform context with other context. You need raw access to chunks for inspection or display.
Use ask when
You want a finished answer. You’re building a chat or Q&A surface where the user expects narrative output. You’re OK with the extra LLM cost and latency.
What’s next
Managing memories
Forget, audit, and inspect what’s in a namespace.
API Reference
Full request and response schemas for
recall and ask.