A scannable specification of tamper-evident records of agent execution — for CCOs, Heads of AI Risk, Internal Audit, and the platform engineers who’ll implement the capture.
Your agents are already in production. Within twenty-four months, someone — an auditor, a regulator, opposing counsel — will ask you to reconstruct a specific agent decision end-to-end and prove the log hasn’t been touched.
The tools you bought to ship them were not built for that conversation. Observability is mutable and 14-day. GRC tools are checklists. AI governance is policy-layer. None of them indexes the agent run as evidence.
It’s a data-model gap, not a feature gap. Spans, controls, and policy registries are the wrong primitives. The right primitive is the agent execution graph — prompts, tools, retrievals, approvals, refusals, side effects — hashed, signed, and externally anchored.
Boring primitives, conservatively combined. SHA-256 hash chain · daily Merkle root signed in an HSM · weekly anchor to Sigstore Rekor · storage in S3 Object Lock compliance mode. The auditor verifies offline, with a Go binary we publish.
We lead with the deadlines that are dated and live. DORA since Jan 2025. April 2026 interagency MRM principles. GDPR Article 22. SOX, HIPAA. The EU AI Act high-risk window opens 2 Dec 2027 — now fixed, not standards-conditional. If none of those is within twelve months of you, we’re the wrong product. We’ll say so on the call.
Runfile is design-partner-stage. The architecture in §04–§08 is specified, partially implemented, and being validated with a small named set of design partners. Where a component ships, we say so. Where it’s v1.5, we say so.
Three audiences should read this in three different orders. Pick yours.
In 2025 and 2026, regulated firms moved AI agents into production. The questions that follow have not changed in twenty years of financial-services audit: who acted, on whose behalf, when, on what basis — and can you prove the record has not been altered.
The tools the engineering teams bought were built for a different question.
Datadog, Langfuse, LangSmith, Arize. 14–15 day retention by default. Mutable spans. No control mapping. Optimised for finding why a prompt regressed last Tuesday.
Vanta, Drata, OneTrust. Index (control_id, evidence_artifact, owner). Excellent for SOC 2 evidence rooms. Cannot represent an agent run as a first-class object.
Credo AI, Holistic AI, ValidMind. Index the AI system as a registered object. Good for the model-risk function. Not the runtime record of what the agent did on Tuesday at 11:42.
Indexes the agent execution graph. Hash-chained, signed, externally anchored, control-mapped, retained for the obligation in scope. Verifiable offline, with a binary we publish.
The gap isn’t capability. It’s the data model. Runfile indexes the right object.
A useful way to test any compliance tool: write down the question the auditor will ask, then check whether the tool can answer it.
Take a credit-decisioning agent at a UK retail bank. It pulls a bureau file, computes debt-to-income, runs a policy threshold, and either approves, declines, or escalates. Three months after go-live, a declined customer files with the Ombudsman. Internal Audit is asked to reconstruct.
Show me every action this agent took on behalf of customer 8041 between 11:42 and 11:43 GMT on 14 March 2026. Include the prompt, the model version, the retrieved bureau response, the policy that fired, the human approval if any, and the final decision. Prove the log has not been modified since the action was taken. Map each event to GDPR Article 22, SS1/23, and the Consumer Duty fair-value test.
— Internal Audit reconstruction request, paraphrasedEvery element is concrete. No judgement calls. The answers either exist or they do not. The auditor expects a citation, not a dashboard.
The vocabulary changes by regulator. The shape does not. A SOX §404 question on a claims-routing agent. A DORA Article 17 incident reconstruction at an EU asset manager. An FDA 21 CFR Part 11 protocol-amendment audit at a pharma. All ask the same thing of the same object: the agent execution graph, signed, complete, control-mapped, retained, and verifiable.
The auditor’s job is to ask the question. Runfile’s job is to make the answer producible in a form they’ll accept.
Three adjacent categories each solve a real problem. None of them solves the one in §02. The reason is the shape of their data model, not their capability.
Datadog, Langfuse, LangSmith, Braintrust, Arize. Data model: (trace_id, span_id, input, output, latency, model, tokens). Three blockers as evidence:
We’re not trying to be a better Datadog. We feed off their OTel emissions where they’re already running.
Vanta hit $300M ARR in April 2026. Drata at $100M. OneTrust at half a billion. Excellent for SOC 2 evidence rooms, where the unit is a state-of-the-world at a point in time. An agent run isn’t a point in time. It’s a sequence. The GRC model cannot represent a sequence as a first-class object.
Credo AI, Holistic AI, ValidMind. Index the AI system itself — its policies, its risk assessment, its bias evaluation, its registry mapping. The right product for the model-risk function. Not the runtime record.
| Layer | Object indexed | Vendors | Audit-grade for runs? |
|---|---|---|---|
| Engineering debug | OpenTelemetry trace | Datadog · Langfuse · LangSmith · Arize | No |
| GRC | (control, evidence_artifact) | Vanta · Drata · OneTrust | No |
| AI governance | AI system posture | Credo AI · Holistic AI · ValidMind | No |
| Agent assurance | Agent execution graph | Runfile | Yes — built for it |
Three categories. Three indexes. None of them is the auditor’s object.
Every agent invocation is recorded as one bounded object: the run. A run has a beginning, an end, an outcome, and a directed graph of events. The graph is the unit of evidence.
Within each run, Runfile records the following event classes. Each event is structured, typed, and validated against a schema before it is hashed into the chain.
A tool call may be triggered by an LLM call, which may be triggered by an orchestrator step. Parent-child edges are first-class. The auditor’s question is a graph traversal. A trace is a list; a run is a graph.
Schema authored in TypeScript + Zod, single source of truth. Generates JSON Schema (Python/Pydantic), Go structs (event processor, verifier CLI), TS types. Wire format is canonical JSON per RFC 8785 — the precondition for hashing. Auditor-facing exports emit JSON-LD with a stable context. Schema migrations are additive only for the lifetime of the retention obligation.
Spans are lists. Runs are graphs. You can’t traverse a list to answer the auditor’s question.
Trust here means cryptographic, not contractual. The auditor should not need to trust Runfile’s policies or our staff. They should be able to verify the data alone, offline, with a binary they downloaded from our public GitHub releases.
Per-event. Tamper with one payload → its hash changes → every later chain entry diverges → the day’s Merkle root mismatches the signed manifest. Detected from any later event onward.
Daily. The Merkle root is signed by a per-tenant KMS key (FIPS 140-2 Level 3 HSM). Re-checkable offline with the public key we publish.
External. The weekly meta-root is anchored to Sigstore Rekor. Even Runfile, with full database access, structurally cannot alter a past root without producing a meta-root that contradicts the public log. The auditor verifies us without us.
Compliance mode is the stricter Object Lock variant. Not even the AWS root account can delete or modify an object within the retention window. Runfile is removed from the threat model. An insider with full root credentials cannot alter a single payload. Verifiable independently via the customer’s CloudTrail.
Customers ask: can we hold the signing key and sign our own logs? No. If the agent runtime can sign its own logs, the logs aren’t credible evidence against the customer — same problem as a defendant signing their own affidavit. The separation between the runtime and the signing infrastructure is the chain-of-custody claim. We won’t break it on request.
Boring primitives, conservatively combined. An evidence package signed today remains verifiable if Runfile ceased to exist tomorrow.
The capture SDK is the only Runfile component that runs inside the customer’s environment. Open source, Apache 2.0.
LangGraph, OpenAI Agents SDK, Anthropic Claude SDK, Model Context Protocol (MCP).
LangGraph.js, Claude SDK TS. Mastra and Vercel AI SDK land in v1.5.
One @capture decorates an agent function. Every framework-native event captured automatically.
If you already emit OTel GenAI semantic conventions, we ride alongside — no agent code changes.
Deterministic tokenisation at the SDK boundary. The mapping lives in the Runfile Token Vault — separate service, separate KMS key, separate IAM boundary, separate audit log from the event store. Three properties follow:
The Vault is its own service deliberately. Combining auth, PII reidentification, and signing behind one IAM boundary would concentrate three different secret classes with one blast radius. Different access patterns, different audit requirements, different threat models.
The SDK is open. The PII stays put. The signing key is on our side of the wall by design.
Some frameworks are dateable. Some are principles-based. We lead the sales motion with the dateable ones and offer the principles-based ones as supported, not promised.
| Framework | Date | What it demands | Retention |
|---|---|---|---|
| DORA | Live 17 Jan 2025 |
48-hour ICT incident report; ICT third-party register; audit trail. BaFin (mid-2025) brought AI explicitly into scope; UK CTP regime parallel from 1 Jan 2025. | ~ 5 years |
| Fed / FDIC / OCC MRM | Live 17 Apr 2026 |
Principles-based, risk-tiered, technology-neutral interagency guidance. SR 11-7’s three pillars — independent validation, ongoing monitoring, documentation — remain the operating template. | n/a |
| GDPR Art. 22 & 30 | Live May 2018 |
Human oversight on solely-automated decisions with legal or significant effect; records of processing under Article 30. | Sector law |
| SOX §404 | Live 2002 |
When an agent triggers or affects a financial control — revenue recognition, journal entries, period close — the agent action becomes SOX-relevant. | 7 years |
| HIPAA OCR audit logs | Live 1996 |
When an agent is the actor touching PHI, the covered entity needs an identifiable agent principal, the action, the PHI fields touched, and any human approval. | 6 years |
| EU AI Act Art. 12 & 26(6) |
Standalone 2 Dec 2027 Embedded 2 Aug 2028 |
Automatic recording of events over the lifetime of the system; deployers retain logs. Fixed dates per the May 2026 Digital Omnibus — no longer standards-conditional. | ≥ 6 mo floor |
| TRAIGA (Texas) | Effective 1 Jan 2026 |
Intentional-misuse focus rather than broad high-risk categorisation. Audit-ready evidence on AG investigation. | n/a |
Colorado AI Act has moved three times in nine months — effective 1 Feb 2026, delayed to 30 Jun 2026 (SB 25B-004), then stayed 27 Apr 2026 by federal magistrate order in the xAI matter. SB 189 would push to 1 Jan 2027 if signed. We support whatever the final Act requires. We don’t market a Colorado date.
| Tier | Trigger | Max fine |
|---|---|---|
| 1 | Article 5 prohibited practices | €35M / 7% of global turnover |
| 2 | Article 16 breaches (incl. Art. 12 logging) | €15M / 3% of global turnover |
| 3 | Incorrect information to authorities | €7.5M / 1% of global turnover |
YAML repository, version-controlled, signed at release. Public to customers. The mapping version applied to a run is recorded in that run’s manifest, so evidence produced today can be re-verified two years from now against the mapping in force at capture.
DORA is live. April 2026 MRM is live. The AI Act high-risk window opens 2 Dec 2027 — fixed, not standards-conditional. The window to be ready is now.
A third party — auditor, regulator, opposing counsel — can verify a Runfile evidence package without trusting Runfile. The package is designed for that property from the inside out.
Scope, event count, integrity status, controls in scope, Merkle root, Rekor entry #, QR to permanent identifier.
Full canonical-JSON event graph, one run per file, parent-child edges preserved.
Every relevant daily Merkle manifest. The inclusion proof against the public Sigstore Rekor log. The public key.
Versioned control-mapping repository as applied. README in plain English on how to verify.
Go binary, single statically-linked executable, public GitHub release. Zero dependency on Runfile’s running infrastructure.
$ runfile verify ./acme_credit_review_2026_q1.zip
Six checks:
Output: a signed PDF + a JSON detail file. Attach both to the workpaper.
Air-gapped audit environments can’t depend on the internet. Runfile’s verifier doesn’t require it — we ship the Rekor inclusion proofs with the package. The auditor is verifying using the public key, canonical JSON, SHA-256, the Merkle construction (RFC 6962), and the Rekor format — all open standards. Runfile is not in the trust loop.
Internal Audit doesn’t buy the platform because they trust Runfile. They buy it because they don’t need to.
A whitepaper that doesn’t list its disqualifiers shouldn’t be trusted. Here’s where we’re the wrong answer.
Runfile records; it does not gate. Buy Lakera or Galileo. Their refusal events show up in our chain as first-class evidence of effective oversight.
High-cardinality, mutable, 14-day spans is the right shape for that job. Datadog, Langfuse, LangSmith, Arize, Braintrust. Runfile rides alongside.
Vanta, Drata, OneTrust. Runfile feeds into them. We’re not them.
Most early-stage AI products do not need Runfile yet. Buy LangSmith, ship the product, come back when the auditor calls.
The case for paying for tamper-evident evidence today is weaker if the demand-driving regulation is more than a year out. We’ll say so on the call.
We don’t. The category was burned recently by a startup faking AI-generated evidence. Agents act, we record, auditors verify.
Selling to a buyer for whom Runfile is wrong is worse than not selling at all. The disqualifiers are how we filter our pipeline.
The v1 architecture is designed to permit v1.5 capabilities as additive changes — not rewrites. Schema fields exist now and are null. IAM boundaries are already in place. Evidence signed today remains verifiable through v1.5 and beyond.
| Capability | v1 | v1.5 |
|---|---|---|
| Nitro Enclave signer | KMS HSM signing | Attestation document with code provenance |
| Multi-region | One region per tenant | Multi-region replication via Terraform stack |
| BYO-cloud | Runfile AWS account | Ingest in customer’s AWS account, cross-account roles |
| TypeScript SDK | Partial (LangGraph.js, Claude SDK TS) | Parity (Mastra, Vercel AI SDK) |
| Self-service signup | Sales-led provisioning | Starter tier self-service |
| Auth | Scoped API keys | OAuth2 client credentials, mTLS |
| SIEM streaming | Runfile audit-of-audit only | Customer-side Splunk / Sentinel / Datadog Security |
| Two-approver workflows | Single approver | Multi-approver for Enterprise tier |
v1 is shippable today. v1.5 is additive, not a rewrite.
The argument: the agent execution graph is the right data object for AI compliance, and nobody ships it today.
It’s contestable. Other models exist — OTel traces, control-evidence pairs, AI-system-as-object — and any of them may turn out to be the right shape too. We’ve made our case; the market will adjudicate.
What is not contestable is the question. The auditor will ask: what did this agent do, on behalf of this customer, at this time, and can you prove the record is intact. The question is twenty years old in financial services. It doesn’t become a different question when the actor is an LLM with a tool belt.
Runfile is what we’re building to answer it. If you’re the CCO, Head of AI Risk, or Head of Internal Audit at a bank, insurer, or asset manager in the US, UK, or EU, we’d like to hear from you. Internal Audit signs off on what we ship; bring them to the call.
Whitepaper v1.0 · May 2026. Updated when the regulatory landscape shifts materially or v1.5 ships. Latest at runfile.ai/whitepaper; previous versions archived.