Comparison — Runfile

§01 — At a glance

Where Runfile sits on the matrix.

Three adjacent categories. None of them ship what an auditor accepts as evidence. The first-column shows the auditor’s actual questions.

Auditor question	Dev observability Datadog · Langfuse · LangSmith · Arize	GRC platforms Vanta · Drata · OneTrust	AI governance Credo AI · Holistic AI · ValidMind	Runfile
Reconstruct the decision end-to-end	◐ Spans, 14-day	✗ Not modelled	◐ Policy layer	✓ Execution graph
Prove the log hasn’t been touched	✗ Mutable	✗ Mutable	✗ Mutable	✓ Hash-chained · signed · anchored
Map an event to a specific control	✗ No mapping	◐ Checklist-grade	◐ Policy-grade	✓ Event → predicate
Retain for 6 months to 10 years	✗ 14-day default	◐ Document-grade	◐ Document-grade	✓ Tiered, S3 Object Lock
Verify offline, by a third party	✗	✗	✗	✓ verify.sh + public anchor
Produce a single auditor-facing artefact	✗ Dashboards	◐ Evidence rooms	◐ Policy reports	✓ Signed PDF + JSONL + verifier
Data residency for EU / UK / US	✓	✓	◐ Patchy	✓ Day one, single-tenant
Bills on what you ship, not on volume	✗ Per-span / per-GB	✓ Seat / framework	✓ Framework	✓ Per execution + retention tier
Who buys it	Platform Eng	CISO · Compliance	Head of AI Governance	CCO + Internal Audit

§02 — Read in plain language

The honest version.

We’re happy to recommend the right tool for the right job. Runfile isn’t a debugger or a checklist. Here’s what each adjacent tool is good at.

§ Datadog · Langfuse · LangSmith · Arize

Built to debug. Not to defend.

Best-in-class for engineering — traces, evals, prompt iteration, latency hunts. The data model is the OpenTelemetry trace: (trace_id, span_id, input, output, latency). Default retention is 14–15 days. The logs are mutable, billed by span volume, and untouched by control mapping. Keep them; Runfile reads OTel and rides alongside.

§ Vanta · Drata · OneTrust

Built for checklists. Not for runs.

Vanta hit $300M ARR in April 2026; Drata is at $100M; OneTrust at half a billion. They are excellent at SOC 2 evidence rooms and the policy/inventory layer. The data model is (control_id, evidence_artifact, owner, status). They do not model the agent run. Runfile feeds them, not the other way around.

§ Credo AI · Holistic AI · ValidMind

Built for policy. Not for proof.

Credo AI’s “Agent Registry,” Holistic’s bias and red-team toolkits, ValidMind’s bank-MRM evidence are real and useful. They generate audit artefacts at the policy layer. The agent execution graph — the actual sequence of prompts, tools, retrievals, refusals and approvals — is not their object. It is ours.

§03 — The honest disqualifier

When Runfile isn’t the right answer.

If none of these regulations are within twelve months of you, the case for Runfile is weaker. Buy a good observability tool, ship your product, come back when the auditor calls.

— 01

You’re a consumer app with no regulated workload and no contractual evidence obligations.

Buy LangSmith or Langfuse.

— 02

You need real-time guardrails — block a bad output before it leaves the model.

Buy Lakera or Galileo. Runfile records; it does not gate.

— 03

You need a SOC 2 evidence room and policy library, no agents in scope yet.

Vanta or Drata is the right shape today.

Different data model. Different buyer.

Where Runfile sits on the matrix.

The honest version.

Built to debug. Not to defend.

Built for checklists. Not for runs.

Built for policy. Not for proof.

When Runfile isn’t the right answer.

Bring the comparison
to your auditor.

Different data model. Different buyer.

Where Runfile sits on the matrix.

The honest version.

Built to debug. Not to defend.

Built for checklists. Not for runs.

Built for policy. Not for proof.

When Runfile isn’t the right answer.

Bring the comparisonto your auditor.

Bring the comparison
to your auditor.