memories.apexpots.com · v0.1.0 · contract 0.1.0-m10 quickstart explore jobs docs agent guide openapi github

donto-memory

Persistent memory for AI agents — built on the donto evidence substrate. Bitemporal. Paraconsistent. Policy-gated. No silent rewrites.

If you are an AI agent reading this page: you have two endpoints to know. POST /memorize takes raw text (and optionally images — OCR + vision extraction run automatically) and stores it as ~50–250 ontological statements an LLM extracts from it. POST /recall takes a query and returns everything we know that matches, with policy + ranking applied. Read the agent guide for the full contract, or the integration spec for ship-order patterns.

Quick start Explore memories Agent guide (markdown) Swagger UI OpenAPI 3.1 GitHub

1What this is

A persistent memory layer for long-lived AI agents. You write memories as plain text. We store the text and break it down into typed ontological statements using an LLM, save everything to the substrate, and serve it back on recall — with policy gating, identity-lens resolution, and bitemporal time-travel for free.

The shape of a memory

When you call POST /memorize with one sentence of text, you get back ~30 facts. With a paragraph, ~100–250. Each fact is a typed triple (subject, predicate, object) the substrate stores with full bitemporal provenance.

Example. You send: "I met Annie Davis at the Cooktown Festival in October 1979."

The substrate gets:

(agent:you,        ex:met,         person:annie-davis)
(agent:you,        ex:metAt,       event:cooktown-festival-1979)
(person:annie-davis, rdf:type,     ex:Person)
(person:annie-davis, ex:hasName,   "Annie Davis")
(event:cooktown-festival-1979, rdf:type, ex:Festival)
(event:cooktown-festival-1979, ex:occurredAt, "1979-10")
(event:cooktown-festival-1979, ex:locatedIn,  place:cooktown)
(place:cooktown, rdf:type, ex:Place)
... and ~50 more from the inferential + conceivable apertures

What this is NOT

2Three commitments

  1. No silent rewrite. Reconsolidation derives new claims from recalled ones — it never overwrites originals. The original chunks remain queryable forever; derived summaries are additional claims with a supersedes argument edge back to the source.
  2. Read events are not belief events. A recall bumps a counter in donto_x_memory_access, not anywhere in the substrate's donto_statement table. Bitemporal discipline preserved: tx_time reflects belief, not access.
  3. Policy-aware by default. Every recall passes a holder agent and an action through to POST /recall on the substrate. The substrate returns rows the holder is attested for; donto-memory passes them through unchanged.

3Quickstart

The minimal save-and-recall loop, in 30 seconds:

# Save a memory. We extract ~80-250 ontological statements
# (z-ai/glm-5 via OpenRouter, five parallel "apertures").
curl -X POST https://memories.apexpots.com/memorize \
  -H 'Content-Type: application/json' \
  -d '{
    "holder":     "agent:my-bot",
    "session_id": "conversation-2026-05-28",
    "text":       "The user told me they prefer vegetarian restaurants and live in Brooklyn."
  }'

# Recall.
curl -X POST https://memories.apexpots.com/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "holder":     "agent:my-bot",
    "action":     "read_content",
    "query":      "vegetarian",
    "limit":      20
  }'

That's the whole loop. Read the rest of this page for the deep contract.

4How save works

POST /memorize performs four steps in order:

  1. Episodic storage. Your raw text is asserted verbatim as a mem:episodic/chunk statement under ctx:memory/episodic/session/<session_id>. This is the canonical bytes-on-disk record of what you sent.
  2. LLM extraction. The default model (z-ai/glm-5) processes the chunk via one of two modes:
    • single One prompt. ~20-40 facts. ~30-100 s on z-ai/glm-5.
    • exhaustive Five parallel prompts. ~80-250 facts. ~60-180 s. This is the default.
  3. Semantic ingest. Every extracted fact becomes a typed statement filed under ctx:memory/claims/session/<session_id>, linked back to the episodic chunk via mem:claim/derived_from.
  4. Multimodal (optional). Pass an images: [...] array of http(s) URLs or data:image/...;base64,… data URLs and donto-memory runs an OCR pre-pass (transcribed words get appended to the episodic chunk as [OCR text from image #N]\n<words> blocks, so screenshots and signs become searchable via /recall?query=…) then switches the structured extractor to OpenAI multimodal format using the configured DONTO_MEMORY_LLM_VISION_MODEL — currently openai/gpt-4o-mini in production. The episodic chunk now contains your text plus the OCR transcripts; the extracted facts describe both the visible text and the visual content. mode: "single" is the sweet spot for images; exhaustive multiplies token cost across 5 apertures. See agent guide §4 for the full request shape.
  5. Receipt. You receive the episodic record IDs, every semantic record ID, the per-aperture yields, and the LLM token usage. See the agent guide reference for the full response shape.

The five apertures (exhaustive mode)

Each aperture is an independent LLM call with its own system prompt and confidence band. Substrate's content-hash dedup runs across the union.

ApertureCapturesConfidenceModality
surfaceWhat the text explicitly states.0.95–1.0asserted
linguisticClause decomposition. Every NP→entity, VP→event, modifier→property.0.85–1.0asserted
presuppositionWhat the text takes for granted.0.7–0.95hypothesis_only
inferentialCommon-knowledge consequences of stated facts.0.4–0.7inferred
conceivableClaims that could plausibly hold given entity types.~0.85hypothesis_only

5How recall works

POST /recall performs six steps:

  1. Module dispatch. Every enabled module runs its own retrieval against the substrate.
  2. Policy gate. Every candidate row passes through the substrate's Trust Kernel. If holder is not attested for action on the row's source policy, action_allowed=false.
  3. Identity-lens resolution. If lens_name is set, the substrate returns the cluster representative for each subject/object.
  4. Bitemporal time-travel. If as_of_tx is set, you get the rows the substrate believed at that timestamp.
  5. Fusion. Module candidates merge via Reciprocal Rank Fusion (k=60). Cross-module overlap boosts rank.
  6. Side effects. Each row writes an access event + bumps recall state + enqueues a reconsolidation task. None of these touch substrate belief state.

6Memory modules

Three modules ship by default. Each defines a particular memory form + function and contributes to recall via its own retrieve method.

episodic token · experiential

Verbatim event/chunk recall. Each ingest writes one mem:episodic/chunk statement with the raw text as a literal. Use for raw user utterances.

semantic-claim structured · factual

Extracted typed claims. Each becomes one substrate statement with source_record_iri back to the episodic. Use for structured facts you already have.

preference structured · preference

User preferences. Updates are append-only: a new value creates a new statement plus a supersedes argument edge to the prior. Both live forever; recall returns the latest.

7API surface

13 endpoints. Full schemas in openapi.json; interactive in Swagger UI.

EndpointPurpose
GET /This homepage
GET /healthLiveness
GET /versionService version + substrate contract floor
GET /substrateEcho substrate /discovery handshake
GET /modulesRegistered memory modules
POST /memorizeThe save endpoint. Episodic + LLM extraction.
POST /memorize/batchSame flow for multiple items
POST /ingest/{module}Bypass LLM. Direct module write.
POST /recallThe read endpoint. Memory Evidence Bundle.
POST /reconsolidate/enqueueManually queue a record for sleep-path processing
GET /reconsolidate/queueHead-of-queue inspector
GET /openapi.jsonOpenAPI 3.1 spec
GET /docsSwagger UI
GET /agent.mdMarkdown guide aimed at AI agents
GET /llms.txtSame content as /agent.md, plain text

8Identity lenses

The substrate stores every entity reference verbatim. When two references actually refer to the same real-world entity, that's recorded as a weighted identity edge — never collapsed at storage. At query time, the identity lens parameter controls how strict the equivalence judgement is.

Default seeded lenses:

Most agent workloads should leave lens_name: null — no expansion. Use likely_identity_v1 when surfacing memories about a person whose name varies across sources.

9Bitemporal time-travel

Every claim has two times:

Recall with "as_of_tx": "2026-05-01T00:00:00Z" returns the rows that were currently believed on that date. If a fact was retracted on 2026-05-15, an as_of_tx=2026-05-10 query still sees it. This is the "what did we know on date X?" pattern.

10Policy actions

Every recall asks for a specific action. The substrate gates each row based on the source's policy capsule + the holder's attestation. Sixteen actions:

ActionWhen to ask
read_metadataSee that the row exists. Default-permitted.
read_contentRead actual values. The common agent recall case.
quoteInclude verbatim in a user-visible answer.
view_anchor_locationSee where in the source the claim was extracted.
derive_claimsExtract new derived statements.
derive_embeddingsGenerate embeddings.
translateTranslate the content.
summarizeProduce a summary.
export_claimsInclude in a release.
train_modelUse in model training.
publish_releaseInclude in a citable release.
share_with_third_partyPass to another agent/system.
federated_queryCross-instance read.
request_deletionInitiate tombstoning. Heavy.

Most agent workflows want read_content (or quote if you'll be showing the text directly).

11Cost expectations

LLM token cost dominates /memorize. Recall is free of LLM cost.

ModeLLM callsTokens~Cost (OpenRouter glm-5)Latency
single1~400 + ~5-8K$0.015–$0.03030–100 s
exhaustive5 parallel~2K + ~15-25K$0.04–$0.0860–180 s

Recall latency depends on session size: per-user sessions with a few dozen records return in 30–80 ms; big channel-scoped sessions with hundreds-to-thousands of records take 3–5 s because the substrate's policy gate scans every candidate. The hot-path composer adds ~30–100 ms on top. On the standard hardware (the substrate side dominates; we add ~10-20 ms of fusion + access bookkeeping).

12Failure modes

SymptomCauseFix
400 Bad RequestMissing/empty textSend non-empty text
500 + warningsLLM call failed; episodic still savedRetry or enqueue reconsolidation
TimeoutExhaustive mode takes 30–90 sUse mode=single for low-latency paths
Recall returns 0 rows for a memory you just savedWrong holder/session/filterDrop the filter, verify holder match
Every row has action_allowed=falseDefault policy is fail-closed for content-exposing actionsRequest read_metadata (always allowed) or get an attestation

Retries are safe. The substrate dedups by content hash; repeated /memorize of the same chunk produces only one episodic statement. Semantic extraction may produce variant facts on each call but won't contradict — donto preserves contradictions if they happen.

13Architecture

┌────────────────────────────────────────────────────────────┐
│  donto-memory (single Rust binary, three subcommands)      │
│                                                            │
│   donto-memory api      donto-memory worker                │
│   (axum :7900)          (tokio loop, queue drain)          │
│           │                       │                        │
│           ▼                       ▼                        │
│   ┌────────────────────────────────────────────────────┐   │
│   │   donto_memory_core (library)                      │   │
│   │   modules: episodic / semantic-claim / preference  │   │
│   │   hot_path: recall composer + RRF fusion           │   │
│   │   sleep_path: reflect + apply DontoDelta           │   │
│   │   substrate: reqwest → dontosrv                    │   │
│   │   overlays: tokio-postgres helpers                 │   │
│   │   extract: 5-aperture LLM via OpenAI-compatible    │   │
│   └────────────────────────┬───────────────────────────┘   │
└────────────────────────────┼───────────────────────────────┘
                             ▼
                ┌─────────────────────────────┐
                │   donto (substrate)         │
                │   dontosrv :7879            │
                │   any donto instance        │
                └─────────────────────────────┘

Two long-running processes plus five overlay tables in the substrate's database:

Overlay tableRole
donto_x_memory_moduleRegistered modules
donto_x_memory_recordOne row per unit of memory
donto_x_memory_accessAppend-only access events
donto_x_memory_stateBitemporal-versioned recall state
donto_x_memory_reconsolidation_queueSleep-path work items

14Substrate handshake

donto-memory requires a donto substrate at contract version 0.1.0-m10 or higher. The handshake at startup checks GET /discovery/contract-version on the substrate and refuses to run against an older substrate.

You can verify the binding with GET /substrate on donto-memory — it echoes the substrate's /discovery/contract-version and /discovery/substrate-health responses.

15Cookbook

Long-term user profile

// Whenever the user shares a profile fact, memorize it.
fetch("/memorize", { method: "POST", body: JSON.stringify({
  holder:     "agent:my-bot",
  session_id: `user/${user_id}`,
  text:       user_message,
  mode:       "exhaustive"
})});

// Before generating a response, recall.
const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:     "agent:my-bot",
  session_id: `user/${user_id}`,
  action:     "read_content",
  limit:      50
})}).then(r => r.json());

Preference tracking

// When the user says "I prefer X", use the preference module directly.
await fetch("/ingest/preference", { method: "POST", body: JSON.stringify({
  holder: "agent:my-bot",
  key:    "preferred_tone",
  value:  "casual"
})});

// Later, retrieve.
const prefs = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:      "agent:my-bot",
  module_iris: ["mem:module/preference"],
  limit:       100
})}).then(r => r.json());

"What did I know last Tuesday?"

const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:   "agent:my-bot",
  as_of_tx: "2026-05-21T00:00:00Z",
  subject:  "ex:annie-davis"
})}).then(r => r.json());

Semantic-similar across aliases

const bundle = await fetch("/recall", { method: "POST", body: JSON.stringify({
  holder:    "agent:my-bot",
  lens_name: "likely_identity_v1",
  subject:   "ex:annie-davis"
})}).then(r => r.json());
// Returns rows about the canonical Annie + her known aliases
// (Mrs Watson, Mary Watson, etc.) under the `likely` identity lens.

16Source documents (donto blob)

/memorize stores your text as a xsd:string literal inside an mem:episodic/chunk statement — fine for short utterances. For anything you might later want to tombstone, share verbatim, or re-version, register a source document on the substrate first. The substrate already holds ~50,000 content-addressed blobs / 6 GB; your messages join that pool.

The two-step substrate flow before calling /memorize:

# 1. Register a source + policy on the substrate.
curl -X POST $DONTOSRV/sources/register \
  -H 'Content-Type: application/json' \
  -d '{
    "iri":         "doc:my-bot/discord-message/123",
    "source_kind": "agent-message",
    "policy_iri":  "policy:user-conversation",
    "media_type":  "text/markdown"
  }'
# → { "document_id": "", "iri": "doc:my-bot/...", ... }

# 2. Attach the body. SHA-256 dedup happens here.
curl -X POST $DONTOSRV/documents/revision \
  -H 'Content-Type: application/json' \
  -d '{ "document_id": "", "body": "the raw message text…" }'

# 3. Memorize, anchored to the document IRI.
curl -X POST https://memories.apexpots.com/memorize \
  -H 'Content-Type: application/json' \
  -d '{
    "holder":            "agent:my-bot",
    "text":              "the raw message text…",
    "source_record_iri": "doc:my-bot/discord-message/123"
  }'

Bulk files (PDFs, transcripts, archives) are simpler via the CLI: donto blob upload <path> or donto blob sync <dir> for a recursive walk. See the agent guide §13 for the full pattern, policy IRIs, and when to skip this layer entirely.

17Operator surfaces (/jobs, /explore)

/jobs/* and /explore/* expose every memorized text + recall query body. They're observability tools for the operator, not public agent surfaces. On any deployment reachable from the public internet, set DONTO_MEMORY_OPS_TOKEN=<random-token> on the runtime — that gates those 10 routes behind a bearer token:

# anonymous → 401
curl https://memories.apexpots.com/jobs                # → 401

# with the token via Authorization header
curl -H 'Authorization: Bearer <token>' \
  https://memories.apexpots.com/jobs                   # → 200

# or via query string (handy for browser bookmarks)
curl 'https://memories.apexpots.com/jobs?token=<token>'  # → 200

When the env var is unset the routes are open (preserves the local-dev workflow). When set, the comparison is constant-time so the token can't be probed for length or prefix. The agent contract (/, /agent.md, /llms.txt, /openapi.json, /docs, /memorize, /recall, /ingest/*, /modules, /version, /health) is never gated.

GET /api returns a categorized endpoint summary and an ops_token_required boolean so monitoring can detect whether the gate is active.

18Things NOT to do