Skip to main content
Deyta Platform gives you two ways to read from a namespace:
OperationReturns
recallRanked chunks and entities — raw context for your own prompt.
askA synthesized answer with cited sources, generated from your memories.
Use recall when your application is responsible for the final prompt or rendering. Use ask when you want a finished answer.

Recall

const result = await deyta.memory.recall({
  namespace_id: ns.id,
  query: "what do we know about the project?",
  limit: 10,
});

// Drop straight into a prompt
const prompt = `Context:\n${result.context_text}\n\nQuestion: …`;

// Or inspect raw matches
for (const match of result.results) {
  console.log(match.score, match.chunk.content);
}
The response includes:
  • A pre-formatted context_text ready to paste into an LLM prompt
  • A list of ranked chunks with similarity scores
  • Related entities and relationships from the knowledge graph
  • The original query and namespace_id

Search modes

mode controls which retrieval channels run.
await deyta.memory.recall({
  namespace_id: ns.id,
  query: "deployment incidents in Q1",
  mode: "hybrid",
});
ModeWhat runsWhen to use
vectorVector similarity onlyFastest. Good for short, semantic queries.
graphGraph traversal onlyWhen the query references named entities and you want connected information.
hybrid (default)Vector + graph + keyword, fused via Reciprocal Rank FusionBest general-purpose. Use this unless you have a reason not to.
allEvery available channelSlower, returns more context. Use when you want maximum coverage.

Tuning recall

ParameterEffect
limitMaximum number of chunks to return. Default 10.
modeWhich retrieval channels run. See above.
For finer control over fusion weights, recency bias, HyDE expansion, and reranking, configure the org-level defaults in the console rather than per-query.

Temporal queries

Pass explicit time bounds to scope retrieval by event time. This bypasses the natural-language temporal detection that would otherwise parse phrases like “last week” out of the query text.
const result = await deyta.memory.recall({
  namespace_id: ns.id,
  query: "deployment incidents",
  from: new Date("2026-01-01T00:00:00Z"),
  until: new Date("2026-03-31T23:59:59Z"),
});
The SDK accepts either a Date or an ISO-8601 string and serializes appropriately.
You can pass either or both bounds:
BoundsEffect
Both setClosed window — only memories with event time in [start_time, end_time].
start_time onlyHalf-open forward — everything from that timestamp onward.
end_time onlyHalf-open backward — everything up to that timestamp.
NeitherNo temporal filter; natural-language detection on the query text still applies.
Passing an explicit time bound disables natural-language temporal detection on the query, so phrases like “yesterday” in the query string are ignored. Use one or the other, not both.

Ask

ask runs the same hybrid retrieval as recall, then passes the results to an LLM that produces a written answer with cited sources.
const answer = await deyta.memory.ask({
  namespace_id: ns.id,
  query: "What are the key project milestones?",
});

console.log(answer.answer);
console.log(answer.sources);

Tuning synthesis

await deyta.memory.ask({
  namespace_id: ns.id,
  query: "summarize the latest deployment incidents",
  config: {
    min_recall_limit: 3,
    max_recall_limit: 20,
    total_tokens_limit: 4000,
    enabled_tools: ["memory_recall", "entity_search"],
  },
  from: new Date("2026-04-01T00:00:00Z"),
  until: new Date("2026-04-30T23:59:59Z"),
});
FieldEffect
min_recall_limitMinimum number of memories to recall before synthesizing.
max_recall_limitMaximum number of memories to recall.
total_tokens_limitMaximum total tokens to use for synthesis.
enabled_toolsWhich retrieval tools the synthesizer is allowed to use. Common values: memory_recall, entity_search, entity_explore, web_search.
from / until work the same way as on recall — they cap the time window for any internal recall the synthesizer issues.

Recall vs ask: when to use which

Use recall when

You control the final prompt. You want the cheapest, fastest read. You’re combining Deyta Platform context with other context. You need raw access to chunks for inspection or display.

Use ask when

You want a finished answer. You’re building a chat or Q&A surface where the user expects narrative output. You’re OK with the extra LLM cost and latency.

What’s next

Managing memories

Forget, audit, and inspect what’s in a namespace.

API Reference

Full request and response schemas for recall and ask.