donto-memory

Persistent memory for AI agents — built on the donto evidence substrate. Bitemporal. Paraconsistent. Policy-gated. No silent rewrites.

If you are an AI agent reading this page: you have two endpoints to know. POST /memorize takes raw text (and optionally images — OCR + vision extraction run automatically) and stores it as ~50–250 ontological statements an LLM extracts from it. POST /recall takes a query and returns everything we know that matches, with policy + ranking applied. Read the agent guide for the full contract, or the integration spec for ship-order patterns.

Quick start Explore memories Agent guide (markdown) Swagger UI OpenAPI 3.1 GitHub

1What this is

A persistent memory layer for long-lived AI agents. You write memories as plain text. We store the text and break it down into typed ontological statements using an LLM, save everything to the substrate, and serve it back on recall — with policy gating, identity-lens resolution, and bitemporal time-travel for free.

The shape of a memory

When you call POST /memorize with one sentence of text, you get back ~30 facts. With a paragraph, ~100–250. Each fact is a typed triple (subject, predicate, object) the substrate stores with full bitemporal provenance.

Example. You send: "I met Annie Davis at the Cooktown Festival in October 1979."

The substrate gets:

(agent:you,        ex:met,         person:annie-davis)
(agent:you,        ex:metAt,       event:cooktown-festival-1979)
(person:annie-davis, rdf:type,     ex:Person)
(person:annie-davis, ex:hasName,   "Annie Davis")
(event:cooktown-festival-1979, rdf:type, ex:Festival)
(event:cooktown-festival-1979, ex:occurredAt, "1979-10")
(event:cooktown-festival-1979, ex:locatedIn,  place:cooktown)
(place:cooktown, rdf:type, ex:Place)
... and ~50 more from the inferential + conceivable apertures

What this is NOT

Not a vector database. We use predicate alignment + identity lenses for semantic similarity. Vector embeddings are an optional follow-on (M11.x).
Not a chat history. You can dump every turn in via /memorize, but the value-add is the structured recall, the contradiction handling, and the policy gate.
Not a model. We store facts about what your user has said, what you've inferred, what your user prefers. The reasoning still happens in you.

2Three commitments

No silent rewrite. Reconsolidation derives new claims from recalled ones — it never overwrites originals. The original chunks remain queryable forever; derived summaries are additional claims with a supersedes argument edge back to the source.
Read events are not belief events. A recall bumps a counter in donto_x_memory_access, not anywhere in the substrate's donto_statement table. Bitemporal discipline preserved: tx_time reflects belief, not access.
Policy-aware by default. Every recall passes a holder agent and an action through to POST /recall on the substrate. The substrate returns rows the holder is attested for; donto-memory passes them through unchanged.

3Quickstart

The minimal save-and-recall loop, in 30 seconds:

# Save a memory. We extract ~80-250 ontological statements
# (z-ai/glm-5 via OpenRouter, five parallel "apertures").
curl -X POST https://memories.apexpots.com/memorize \
  -H 'Content-Type: application/json' \
  -d '{
    "holder":     "agent:my-bot",
    "session_id": "conversation-2026-05-28",
    "text":       "The user told me they prefer vegetarian restaurants and live in Brooklyn."
  }'

# Recall.
curl -X POST https://memories.apexpots.com/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "holder":     "agent:my-bot",
    "action":     "read_content",
    "query":      "vegetarian",
    "limit":      20
  }'

That's the whole loop. Read the rest of this page for the deep contract.

4How save works

POST /memorize performs four steps in order:

Episodic storage. Your raw text is asserted verbatim as a mem:episodic/chunk statement under ctx:memory/episodic/session/<session_id>. This is the canonical bytes-on-disk record of what you sent.
LLM extraction. The default model (z-ai/glm-5) processes the chunk via one of two modes:
- single One prompt. ~20-40 facts. ~30-100 s on z-ai/glm-5.
- exhaustive Five parallel prompts. ~80-250 facts. ~60-180 s. This is the default.
Semantic ingest. Every extracted fact becomes a typed statement filed under ctx:memory/claims/session/<session_id>, linked back to the episodic chunk via mem:claim/derived_from.
Multimodal (optional). Pass an images: [...] array of http(s) URLs or data:image/...;base64,… data URLs and donto-memory runs an OCR pre-pass (transcribed words get appended to the episodic chunk as [OCR text from image #N]\n<words> blocks, so screenshots and signs become searchable via /recall?query=…) then switches the structured extractor to OpenAI multimodal format using the configured DONTO_MEMORY_LLM_VISION_MODEL — currently openai/gpt-4o-mini in production. The episodic chunk now contains your text plus the OCR transcripts; the extracted facts describe both the visible text and the visual content. mode: "single" is the sweet spot for images; exhaustive multiplies token cost across 5 apertures. See agent guide §4 for the full request shape.
Receipt. You receive the episodic record IDs, every semantic record ID, the per-aperture yields, and the LLM token usage. See the agent guide reference for the full response shape.

The five apertures (exhaustive mode)

Each aperture is an independent LLM call with its own system prompt and confidence band. Substrate's content-hash dedup runs across the union.

Aperture	Captures	Confidence	Modality
surface	What the text explicitly states.	0.95–1.0	asserted
linguistic	Clause decomposition. Every NP→entity, VP→event, modifier→property.	0.85–1.0	asserted
presupposition	What the text takes for granted.	0.7–0.95	hypothesis_only
inferential	Common-knowledge consequences of stated facts.	0.4–0.7	inferred
conceivable	Claims that could plausibly hold given entity types.	~0.85	hypothesis_only

5How recall works

POST /recall performs six steps:

Module dispatch. Every enabled module runs its own retrieval against the substrate.
Policy gate. Every candidate row passes through the substrate's Trust Kernel. If holder is not attested for action on the row's source policy, action_allowed=false.
Identity-lens resolution. If lens_name is set, the substrate returns the cluster representative for each subject/object.
Bitemporal time-travel. If as_of_tx is set, you get the rows the substrate believed at that timestamp.
Fusion. Module candidates merge via Reciprocal Rank Fusion (k=60). Cross-module overlap boosts rank.
Side effects. Each row writes an access event + bumps recall state + enqueues a reconsolidation task. None of these touch substrate belief state.

6Memory modules

Three modules ship by default. Each defines a particular memory form + function and contributes to recall via its own retrieve method.

episodic token · experiential

Verbatim event/chunk recall. Each ingest writes one mem:episodic/chunk statement with the raw text as a literal. Use for raw user utterances.

semantic-claim structured · factual

Extracted typed claims. Each becomes one substrate statement with source_record_iri back to the episodic. Use for structured facts you already have.

preference structured · preference

User preferences. Updates are append-only: a new value creates a new statement plus a supersedes argument edge to the prior. Both live forever; recall returns the latest.

7API surface

13 endpoints. Full schemas in openapi.json; interactive in Swagger UI.

Endpoint	Purpose
GET `/`	This homepage
GET `/health`	Liveness
GET `/version`	Service version + substrate contract floor
GET `/substrate`	Echo substrate /discovery handshake
GET `/modules`	Registered memory modules
POST `/memorize`	The save endpoint. Episodic + LLM extraction.
POST `/memorize/batch`	Same flow for multiple items
POST `/ingest/{module}`	Bypass LLM. Direct module write.
POST `/recall`	The read endpoint. Memory Evidence Bundle.
POST `/reconsolidate/enqueue`	Manually queue a record for sleep-path processing
GET `/reconsolidate/queue`	Head-of-queue inspector
GET `/openapi.json`	OpenAPI 3.1 spec
GET `/docs`	Swagger UI
GET `/agent.md`	Markdown guide aimed at AI agents
GET `/llms.txt`	Same content as `/agent.md`, plain text

8Identity lenses

The substrate stores every entity reference verbatim. When two references actually refer to the same real-world entity, that's recorded as a weighted identity edge — never collapsed at storage. At query time, the identity lens parameter controls how strict the equivalence judgement is.

Default seeded lenses:

strict_identity_v1 — only edges with confidence ≥ 0.98.
likely_identity_v1 — ≥ 0.85.
exploratory_identity_v1 — ≥ 0.60.

Most agent workloads should leave lens_name: null — no expansion. Use likely_identity_v1 when surfacing memories about a person whose name varies across sources.

9Bitemporal time-travel

Every claim has two times:

valid_time — when the fact was true in the world.
tx_time — when we believed it.

Recall with "as_of_tx": "2026-05-01T00:00:00Z" returns the rows that were currently believed on that date. If a fact was retracted on 2026-05-15, an as_of_tx=2026-05-10 query still sees it. This is the "what did we know on date X?" pattern.

10Policy actions

Every recall asks for a specific action. The substrate gates each row based on the source's policy capsule + the holder's attestation. Sixteen actions:

Action	When to ask
`read_metadata`	See that the row exists. Default-permitted.
`read_content`	Read actual values. The common agent recall case.
`quote`	Include verbatim in a user-visible answer.
`view_anchor_location`	See where in the source the claim was extracted.
`derive_claims`	Extract new derived statements.
`derive_embeddings`	Generate embeddings.
`translate`	Translate the content.
`summarize`	Produce a summary.
`export_claims`	Include in a release.
`train_model`	Use in model training.
`publish_release`	Include in a citable release.
`share_with_third_party`	Pass to another agent/system.
`federated_query`	Cross-instance read.
`request_deletion`	Initiate tombstoning. Heavy.

Most agent workflows want read_content (or quote if you'll be showing the text directly).

11Cost expectations

LLM token cost dominates /memorize. Recall is free of LLM cost.

Mode	LLM calls	Tokens	~Cost (OpenRouter glm-5)	Latency
`single`	1	~400 + ~5-8K	$0.015–$0.030	30–100 s
`exhaustive`	5 parallel	~2K + ~15-25K	$0.04–$0.08	60–180 s

Recall latency depends on session size: per-user sessions with a few dozen records return in 30–80 ms; big channel-scoped sessions with hundreds-to-thousands of records take 3–5 s because the substrate's policy gate scans every candidate. The hot-path composer adds ~30–100 ms on top. On the standard hardware (the substrate side dominates; we add ~10-20 ms of fusion + access bookkeeping).

12Failure modes

Symptom	Cause	Fix
400 Bad Request	Missing/empty `text`	Send non-empty text
500 + warnings	LLM call failed; episodic still saved	Retry or enqueue reconsolidation
Timeout	Exhaustive mode takes 30–90 s	Use `mode=single` for low-latency paths
Recall returns 0 rows for a memory you just saved	Wrong holder/session/filter	Drop the filter, verify holder match
Every row has `action_allowed=false`	Default policy is fail-closed for content-exposing actions	Request `read_metadata` (always allowed) or get an attestation

Retries are safe. The substrate dedups by content hash; repeated /memorize of the same chunk produces only one episodic statement. Semantic extraction may produce variant facts on each call but won't contradict — donto preserves contradictions if they happen.

13Architecture

┌────────────────────────────────────────────────────────────┐
│  donto-memory (single Rust binary, three subcommands)      │
│                                                            │
│   donto-memory api      donto-memory worker                │
│   (axum :7900)          (tokio loop, queue drain)          │
│           │                       │                        │
│           ▼                       ▼                        │
│   ┌────────────────────────────────────────────────────┐   │
│   │   donto_memory_core (library)                      │   │
│   │   modules: episodic / semantic-claim / preference  │   │
│   │   hot_path: recall composer + RRF fusion           │   │
│   │   sleep_path: reflect + apply DontoDelta           │   │
│   │   substrate: reqwest → dontosrv                    │   │
│   │   overlays: tokio-postgres helpers                 │   │
│   │   extract: 5-aperture LLM via OpenAI-compatible    │   │
│   └────────────────────────┬───────────────────────────┘   │
└────────────────────────────┼───────────────────────────────┘
                             ▼
                ┌─────────────────────────────┐
                │   donto (substrate)         │
                │   dontosrv :7879            │
                │   any donto instance        │
                └─────────────────────────────┘

Two long-running processes plus five overlay tables in the substrate's database:

Overlay table	Role
`donto_x_memory_module`	Registered modules
`donto_x_memory_record`	One row per unit of memory
`donto_x_memory_access`	Append-only access events
`donto_x_memory_state`	Bitemporal-versioned recall state
`donto_x_memory_reconsolidation_queue`	Sleep-path work items

14Substrate handshake

donto-memory requires a donto substrate at contract version 0.1.0-m10 or higher. The handshake at startup checks GET /discovery/contract-version on the substrate and refuses to run against an older substrate.

You can verify the binding with GET /substrate on donto-memory — it echoes the substrate's /discovery/contract-version and /discovery/substrate-health responses.

15Cookbook

Long-term user profile

// Whenever the user shares a profile fact, memorize it.
fetch("/memorize", { method: "POST", body: JSON.stringify({
  holder:     "agent:my-bot",
  session_id: `user/${user_id}`,
  text:       user_message,
  mode:       "exhaustive"
})});

// Before generating a response, recall.
const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:     "agent:my-bot",
  session_id: `user/${user_id}`,
  action:     "read_content",
  limit:      50
})}).then(r => r.json());

Preference tracking

// When the user says "I prefer X", use the preference module directly.
await fetch("/ingest/preference", { method: "POST", body: JSON.stringify({
  holder: "agent:my-bot",
  key:    "preferred_tone",
  value:  "casual"
})});

// Later, retrieve.
const prefs = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:      "agent:my-bot",
  module_iris: ["mem:module/preference"],
  limit:       100
})}).then(r => r.json());

"What did I know last Tuesday?"

const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:   "agent:my-bot",
  as_of_tx: "2026-05-21T00:00:00Z",
  subject:  "ex:annie-davis"
})}).then(r => r.json());

Semantic-similar across aliases

const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:    "agent:my-bot",
  lens_name: "likely_identity_v1",
  subject:   "ex:annie-davis"
})}).then(r => r.json());
// Returns rows about the canonical Annie + her known aliases
// (Mrs Watson, Mary Watson, etc.) under the `likely` identity lens.

16Source documents (`donto blob`)

/memorize stores your text as a xsd:string literal inside an mem:episodic/chunk statement — fine for short utterances. For anything you might later want to tombstone, share verbatim, or re-version, register a source document on the substrate first. The substrate already holds ~50,000 content-addressed blobs / 6 GB; your messages join that pool.

The two-step substrate flow before calling /memorize:

# 1. Register a source + policy on the substrate.
curl -X POST $DONTOSRV/sources/register \
  -H 'Content-Type: application/json' \
  -d '{
    "iri":         "doc:my-bot/discord-message/123",
    "source_kind": "agent-message",
    "policy_iri":  "policy:user-conversation",
    "media_type":  "text/markdown"
  }'
# → { "document_id": "", "iri": "doc:my-bot/...", ... }

# 2. Attach the body. SHA-256 dedup happens here.
curl -X POST $DONTOSRV/documents/revision \
  -H 'Content-Type: application/json' \
  -d '{ "document_id": "", "body": "the raw message text…" }'

# 3. Memorize, anchored to the document IRI.
curl -X POST https://memories.apexpots.com/memorize \
  -H 'Content-Type: application/json' \
  -d '{
    "holder":            "agent:my-bot",
    "text":              "the raw message text…",
    "source_record_iri": "doc:my-bot/discord-message/123"
  }'

Bulk files (PDFs, transcripts, archives) are simpler via the CLI: donto blob upload <path> or donto blob sync <dir> for a recursive walk. See the agent guide §13 for the full pattern, policy IRIs, and when to skip this layer entirely.

17Operator surfaces (`/jobs`, `/explore`)

/jobs/* and /explore/* expose every memorized text + recall query body. They're observability tools for the operator, not public agent surfaces. On any deployment reachable from the public internet, set DONTO_MEMORY_OPS_TOKEN=<random-token> on the runtime — that gates those 10 routes behind a bearer token:

# anonymous → 401
curl https://memories.apexpots.com/jobs                # → 401

# with the token via Authorization header
curl -H 'Authorization: Bearer <token>' \
  https://memories.apexpots.com/jobs                   # → 200

# or via query string (handy for browser bookmarks)
curl 'https://memories.apexpots.com/jobs?token=<token>'  # → 200

When the env var is unset the routes are open (preserves the local-dev workflow). When set, the comparison is constant-time so the token can't be probed for length or prefix. The agent contract (/, /agent.md, /llms.txt, /openapi.json, /docs, /memorize, /recall, /ingest/*, /modules, /version, /health) is never gated.

GET /api returns a categorized endpoint summary and an ops_token_required boolean so monitoring can detect whether the gate is active.

18Things NOT to do

Don't try to delete memories. Donto is append-only. Tombstoning (true deletion) requires an attestation and goes through the substrate, not this API.
Don't include API keys or secrets in text. Memories are persisted forever.
Don't memorize the same chunk repeatedly to "boost" recall. Donto dedups at the content-hash level.
Don't rely on mode: "exhaustive" for sub-second latency. Five parallel LLM calls still take 30–90 s.
Don't bypass /memorize unless you mean it. Direct /ingest/semantic-claim calls produce claims with no episodic anchor — downstream provenance trace cannot follow the chain back.

donto-memory v0.1.0 · contract 0.1.0-m10 · Apache-2.0 OR MIT · Built on the donto evidence substrate · source