donto-memory
Persistent memory for AI agents — built on the donto evidence substrate. Bitemporal. Paraconsistent. Policy-gated. No silent rewrites.
POST /memorize takes raw text
(and optionally images — OCR + vision extraction run
automatically) and stores it as ~50–250 ontological statements an LLM extracts
from it. POST /recall takes a query and returns
everything we know that matches, with policy + ranking applied.
Read the agent guide for the full
contract, or
the integration spec
for ship-order patterns.
Quick start Explore memories Agent guide (markdown) Swagger UI OpenAPI 3.1 GitHub
1What this is
A persistent memory layer for long-lived AI agents. You write memories as plain text. We store the text and break it down into typed ontological statements using an LLM, save everything to the substrate, and serve it back on recall — with policy gating, identity-lens resolution, and bitemporal time-travel for free.
The shape of a memory
When you call POST /memorize with one sentence of
text, you get back ~30 facts. With a paragraph, ~100–250. Each
fact is a typed triple
(subject, predicate, object) the substrate stores
with full bitemporal provenance.
Example. You send: "I met Annie Davis at the Cooktown Festival in October 1979."
The substrate gets:
(agent:you, ex:met, person:annie-davis)
(agent:you, ex:metAt, event:cooktown-festival-1979)
(person:annie-davis, rdf:type, ex:Person)
(person:annie-davis, ex:hasName, "Annie Davis")
(event:cooktown-festival-1979, rdf:type, ex:Festival)
(event:cooktown-festival-1979, ex:occurredAt, "1979-10")
(event:cooktown-festival-1979, ex:locatedIn, place:cooktown)
(place:cooktown, rdf:type, ex:Place)
... and ~50 more from the inferential + conceivable apertures
What this is NOT
- Not a vector database. We use predicate alignment + identity lenses for semantic similarity. Vector embeddings are an optional follow-on (M11.x).
- Not a chat history. You can dump
every turn in via
/memorize, but the value-add is the structured recall, the contradiction handling, and the policy gate. - Not a model. We store facts about what your user has said, what you've inferred, what your user prefers. The reasoning still happens in you.
2Three commitments
-
No silent rewrite. Reconsolidation derives new
claims from recalled ones — it never overwrites originals. The
original chunks remain queryable forever; derived summaries are
additional claims with a
supersedesargument edge back to the source. -
Read events are not belief events. A recall
bumps a counter in
donto_x_memory_access, not anywhere in the substrate'sdonto_statementtable. Bitemporal discipline preserved:tx_timereflects belief, not access. -
Policy-aware by default. Every recall passes a
holder agent and an action through to
POST /recallon the substrate. The substrate returns rows the holder is attested for; donto-memory passes them through unchanged.
3Quickstart
The minimal save-and-recall loop, in 30 seconds:
# Save a memory. We extract ~80-250 ontological statements
# (z-ai/glm-5 via OpenRouter, five parallel "apertures").
curl -X POST https://memories.apexpots.com/memorize \
-H 'Content-Type: application/json' \
-d '{
"holder": "agent:my-bot",
"session_id": "conversation-2026-05-28",
"text": "The user told me they prefer vegetarian restaurants and live in Brooklyn."
}'
# Recall.
curl -X POST https://memories.apexpots.com/recall \
-H 'Content-Type: application/json' \
-d '{
"holder": "agent:my-bot",
"action": "read_content",
"query": "vegetarian",
"limit": 20
}'
That's the whole loop. Read the rest of this page for the deep contract.
4How save works
POST /memorize performs four steps in order:
-
Episodic storage. Your raw text is asserted
verbatim as a
mem:episodic/chunkstatement underctx:memory/episodic/session/<session_id>. This is the canonical bytes-on-disk record of what you sent. -
LLM extraction. The default model
(
z-ai/glm-5) processes the chunk via one of two modes:- single One prompt. ~20-40 facts. ~30-100 s on z-ai/glm-5.
- exhaustive Five parallel prompts. ~80-250 facts. ~60-180 s. This is the default.
-
Semantic ingest. Every extracted fact becomes
a typed statement filed under
ctx:memory/claims/session/<session_id>, linked back to the episodic chunk viamem:claim/derived_from. -
Multimodal (optional). Pass an
images: [...]array of http(s) URLs ordata:image/...;base64,…data URLs and donto-memory runs an OCR pre-pass (transcribed words get appended to the episodic chunk as[OCR text from image #N]\n<words>blocks, so screenshots and signs become searchable via/recall?query=…) then switches the structured extractor to OpenAI multimodal format using the configuredDONTO_MEMORY_LLM_VISION_MODEL— currentlyopenai/gpt-4o-miniin production. The episodic chunk now contains your text plus the OCR transcripts; the extracted facts describe both the visible text and the visual content.mode: "single"is the sweet spot for images;exhaustivemultiplies token cost across 5 apertures. See agent guide §4 for the full request shape. - Receipt. You receive the episodic record IDs, every semantic record ID, the per-aperture yields, and the LLM token usage. See the agent guide reference for the full response shape.
The five apertures (exhaustive mode)
Each aperture is an independent LLM call with its own system prompt and confidence band. Substrate's content-hash dedup runs across the union.
| Aperture | Captures | Confidence | Modality |
|---|---|---|---|
| surface | What the text explicitly states. | 0.95–1.0 | asserted |
| linguistic | Clause decomposition. Every NP→entity, VP→event, modifier→property. | 0.85–1.0 | asserted |
| presupposition | What the text takes for granted. | 0.7–0.95 | hypothesis_only |
| inferential | Common-knowledge consequences of stated facts. | 0.4–0.7 | inferred |
| conceivable | Claims that could plausibly hold given entity types. | ~0.85 | hypothesis_only |
5How recall works
POST /recall performs six steps:
- Module dispatch. Every enabled module runs its own retrieval against the substrate.
- Policy gate. Every candidate row passes
through the substrate's Trust Kernel. If
holderis not attested foractionon the row's source policy,action_allowed=false. - Identity-lens resolution. If
lens_nameis set, the substrate returns the cluster representative for each subject/object. - Bitemporal time-travel. If
as_of_txis set, you get the rows the substrate believed at that timestamp. - Fusion. Module candidates merge via Reciprocal Rank Fusion (k=60). Cross-module overlap boosts rank.
- Side effects. Each row writes an access event + bumps recall state + enqueues a reconsolidation task. None of these touch substrate belief state.
6Memory modules
Three modules ship by default. Each defines a particular memory form + function and contributes to recall via its own retrieve method.
Verbatim event/chunk recall. Each ingest writes one
mem:episodic/chunk statement with the raw text
as a literal. Use for raw user utterances.
Extracted typed claims. Each becomes one substrate statement
with source_record_iri back to the episodic.
Use for structured facts you already have.
User preferences. Updates are append-only: a new value
creates a new statement plus a supersedes
argument edge to the prior. Both live forever; recall returns
the latest.
7API surface
13 endpoints. Full schemas in openapi.json; interactive in Swagger UI.
| Endpoint | Purpose |
|---|---|
GET / | This homepage |
GET /health | Liveness |
GET /version | Service version + substrate contract floor |
GET /substrate | Echo substrate /discovery handshake |
GET /modules | Registered memory modules |
POST /memorize | The save endpoint. Episodic + LLM extraction. |
POST /memorize/batch | Same flow for multiple items |
POST /ingest/{module} | Bypass LLM. Direct module write. |
POST /recall | The read endpoint. Memory Evidence Bundle. |
POST /reconsolidate/enqueue | Manually queue a record for sleep-path processing |
GET /reconsolidate/queue | Head-of-queue inspector |
GET /openapi.json | OpenAPI 3.1 spec |
GET /docs | Swagger UI |
GET /agent.md | Markdown guide aimed at AI agents |
GET /llms.txt | Same content as /agent.md, plain text |
8Identity lenses
The substrate stores every entity reference verbatim. When two references actually refer to the same real-world entity, that's recorded as a weighted identity edge — never collapsed at storage. At query time, the identity lens parameter controls how strict the equivalence judgement is.
Default seeded lenses:
strict_identity_v1— only edges with confidence ≥ 0.98.likely_identity_v1— ≥ 0.85.exploratory_identity_v1— ≥ 0.60.
Most agent workloads should leave lens_name: null —
no expansion. Use likely_identity_v1 when surfacing
memories about a person whose name varies across sources.
9Bitemporal time-travel
Every claim has two times:
valid_time— when the fact was true in the world.tx_time— when we believed it.
Recall with "as_of_tx": "2026-05-01T00:00:00Z"
returns the rows that were currently believed on that
date. If a fact was retracted on 2026-05-15, an
as_of_tx=2026-05-10 query still sees it. This is the
"what did we know on date X?" pattern.
10Policy actions
Every recall asks for a specific action. The substrate gates each row based on the source's policy capsule + the holder's attestation. Sixteen actions:
| Action | When to ask |
|---|---|
read_metadata | See that the row exists. Default-permitted. |
read_content | Read actual values. The common agent recall case. |
quote | Include verbatim in a user-visible answer. |
view_anchor_location | See where in the source the claim was extracted. |
derive_claims | Extract new derived statements. |
derive_embeddings | Generate embeddings. |
translate | Translate the content. |
summarize | Produce a summary. |
export_claims | Include in a release. |
train_model | Use in model training. |
publish_release | Include in a citable release. |
share_with_third_party | Pass to another agent/system. |
federated_query | Cross-instance read. |
request_deletion | Initiate tombstoning. Heavy. |
Most agent workflows want read_content (or
quote if you'll be showing the text directly).
11Cost expectations
LLM token cost dominates /memorize. Recall is free
of LLM cost.
| Mode | LLM calls | Tokens | ~Cost (OpenRouter glm-5) | Latency |
|---|---|---|---|---|
single | 1 | ~400 + ~5-8K | $0.015–$0.030 | 30–100 s |
exhaustive | 5 parallel | ~2K + ~15-25K | $0.04–$0.08 | 60–180 s |
Recall latency depends on session size: per-user sessions with a few dozen records return in 30–80 ms; big channel-scoped sessions with hundreds-to-thousands of records take 3–5 s because the substrate's policy gate scans every candidate. The hot-path composer adds ~30–100 ms on top. On the standard hardware (the substrate side dominates; we add ~10-20 ms of fusion + access bookkeeping).
12Failure modes
| Symptom | Cause | Fix |
|---|---|---|
| 400 Bad Request | Missing/empty text | Send non-empty text |
| 500 + warnings | LLM call failed; episodic still saved | Retry or enqueue reconsolidation |
| Timeout | Exhaustive mode takes 30–90 s | Use mode=single for low-latency paths |
| Recall returns 0 rows for a memory you just saved | Wrong holder/session/filter | Drop the filter, verify holder match |
Every row has action_allowed=false | Default policy is fail-closed for content-exposing actions | Request read_metadata (always allowed) or get an attestation |
Retries are safe. The substrate dedups by content hash; repeated
/memorize of the same chunk produces only one
episodic statement. Semantic extraction may produce variant facts
on each call but won't contradict — donto preserves contradictions
if they happen.
13Architecture
┌────────────────────────────────────────────────────────────┐
│ donto-memory (single Rust binary, three subcommands) │
│ │
│ donto-memory api donto-memory worker │
│ (axum :7900) (tokio loop, queue drain) │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ donto_memory_core (library) │ │
│ │ modules: episodic / semantic-claim / preference │ │
│ │ hot_path: recall composer + RRF fusion │ │
│ │ sleep_path: reflect + apply DontoDelta │ │
│ │ substrate: reqwest → dontosrv │ │
│ │ overlays: tokio-postgres helpers │ │
│ │ extract: 5-aperture LLM via OpenAI-compatible │ │
│ └────────────────────────┬───────────────────────────┘ │
└────────────────────────────┼───────────────────────────────┘
▼
┌─────────────────────────────┐
│ donto (substrate) │
│ dontosrv :7879 │
│ any donto instance │
└─────────────────────────────┘
Two long-running processes plus five overlay tables in the substrate's database:
| Overlay table | Role |
|---|---|
donto_x_memory_module | Registered modules |
donto_x_memory_record | One row per unit of memory |
donto_x_memory_access | Append-only access events |
donto_x_memory_state | Bitemporal-versioned recall state |
donto_x_memory_reconsolidation_queue | Sleep-path work items |
14Substrate handshake
donto-memory requires a donto substrate at contract
version 0.1.0-m10 or higher. The handshake at startup
checks GET /discovery/contract-version on the
substrate and refuses to run against an older substrate.
You can verify the binding with GET /substrate on
donto-memory — it echoes the substrate's
/discovery/contract-version and
/discovery/substrate-health responses.
15Cookbook
Long-term user profile
// Whenever the user shares a profile fact, memorize it.
fetch("/memorize", { method: "POST", body: JSON.stringify({
holder: "agent:my-bot",
session_id: `user/${user_id}`,
text: user_message,
mode: "exhaustive"
})});
// Before generating a response, recall.
const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
holder: "agent:my-bot",
session_id: `user/${user_id}`,
action: "read_content",
limit: 50
})}).then(r => r.json());
Preference tracking
// When the user says "I prefer X", use the preference module directly.
await fetch("/ingest/preference", { method: "POST", body: JSON.stringify({
holder: "agent:my-bot",
key: "preferred_tone",
value: "casual"
})});
// Later, retrieve.
const prefs = await fetch("/recall", { method: "POST", body: JSON.stringify({
holder: "agent:my-bot",
module_iris: ["mem:module/preference"],
limit: 100
})}).then(r => r.json());
"What did I know last Tuesday?"
const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
holder: "agent:my-bot",
as_of_tx: "2026-05-21T00:00:00Z",
subject: "ex:annie-davis"
})}).then(r => r.json());
Semantic-similar across aliases
const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
holder: "agent:my-bot",
lens_name: "likely_identity_v1",
subject: "ex:annie-davis"
})}).then(r => r.json());
// Returns rows about the canonical Annie + her known aliases
// (Mrs Watson, Mary Watson, etc.) under the `likely` identity lens.
16Source documents (donto blob)
/memorize stores your text as a xsd:string
literal inside an mem:episodic/chunk statement —
fine for short utterances. For anything you might later want
to tombstone, share verbatim, or re-version,
register a source document on the substrate
first. The substrate already holds ~50,000 content-addressed
blobs / 6 GB; your messages join that pool.
The two-step substrate flow before calling /memorize:
# 1. Register a source + policy on the substrate.
curl -X POST $DONTOSRV/sources/register \
-H 'Content-Type: application/json' \
-d '{
"iri": "doc:my-bot/discord-message/123",
"source_kind": "agent-message",
"policy_iri": "policy:user-conversation",
"media_type": "text/markdown"
}'
# → { "document_id": "", "iri": "doc:my-bot/...", ... }
# 2. Attach the body. SHA-256 dedup happens here.
curl -X POST $DONTOSRV/documents/revision \
-H 'Content-Type: application/json' \
-d '{ "document_id": "", "body": "the raw message text…" }'
# 3. Memorize, anchored to the document IRI.
curl -X POST https://memories.apexpots.com/memorize \
-H 'Content-Type: application/json' \
-d '{
"holder": "agent:my-bot",
"text": "the raw message text…",
"source_record_iri": "doc:my-bot/discord-message/123"
}'
Bulk files (PDFs, transcripts, archives) are simpler via the
CLI: donto blob upload <path> or
donto blob sync <dir> for a recursive walk.
See the agent guide §13
for the full pattern, policy IRIs, and when to skip this layer
entirely.
17Operator surfaces (/jobs, /explore)
/jobs/* and /explore/* expose every
memorized text + recall query body. They're observability tools
for the operator, not public agent surfaces. On any deployment
reachable from the public internet, set
DONTO_MEMORY_OPS_TOKEN=<random-token> on the
runtime — that gates those 10 routes behind a bearer token:
# anonymous → 401
curl https://memories.apexpots.com/jobs # → 401
# with the token via Authorization header
curl -H 'Authorization: Bearer <token>' \
https://memories.apexpots.com/jobs # → 200
# or via query string (handy for browser bookmarks)
curl 'https://memories.apexpots.com/jobs?token=<token>' # → 200
When the env var is unset the routes are open (preserves the
local-dev workflow). When set, the comparison is constant-time
so the token can't be probed for length or prefix. The agent
contract (/, /agent.md,
/llms.txt, /openapi.json,
/docs, /memorize, /recall,
/ingest/*, /modules,
/version, /health) is
never gated.
GET /api returns a categorized endpoint summary
and an ops_token_required boolean so monitoring
can detect whether the gate is active.
18Things NOT to do
- Don't try to delete memories. Donto is append-only. Tombstoning (true deletion) requires an attestation and goes through the substrate, not this API.
- Don't include API keys or secrets in text. Memories are persisted forever.
- Don't memorize the same chunk repeatedly to "boost" recall. Donto dedups at the content-hash level.
- Don't rely on
mode: "exhaustive"for sub-second latency. Five parallel LLM calls still take 30–90 s. - Don't bypass
/memorizeunless you mean it. Direct/ingest/semantic-claimcalls produce claims with no episodic anchor — downstream provenance trace cannot follow the chain back.