📚
Visual Summary
Post 57 · Agents & Systems · 2026
Incorrect password — try again
Overview
Problem
Anatomy
Bundle
Ecosystem
Standards
Builder
Exercises
Post 57 Agents & Systems Google Cloud · June 2026 📚 OKF v0.1
Open Knowledge Format
The Missing Layer for AI Agents
Models give agents reasoning. Tools give agents execution. But without curated organizational knowledge, agents become fast, confident, and risky — assembling context from incompatible silos on every query. The Open Knowledge Format (OKF) is Google Cloud's answer: a vendor-neutral spec that packages knowledge as plain Markdown directories, readable by humans, parseable by agents, distributable without any proprietary tooling.
1
Required field (type)
451
Lines in entire spec
5k+
Stars — Karpathy's LLM Wiki
3
Reference implementations
Apache 2.0
License
The USB-C Moment for AI Knowledge
Every agent builder is solving the same context-assembly problem from scratch. Every catalog vendor reinvents the same data models. OKF is the USB-C connector — one portable format that any agent, any catalog, any organization can read without proprietary adapters.
What OKF Is Not
OKF does not describe data (that's what tables, APIs, and schemas do). It describes the knowledge around data: what a table means, how a metric is calculated, who owns what, what caveats exist. The context that turns raw data into organizational intelligence.
The Fragmentation Problem
Toggle between the current reality and the OKF-organised world
✖ Current Reality
✔ With OKF
⚠  Agent query: "How do I compute weekly active users?" — must search 5 incompatible systems, get 5 inconsistent answers, synthesise manually.
📔 Confluence Wiki
WAU = distinct users who logged in. Last updated 2023. No SQL. May be stale.
💬 Slack #data-eng
"We changed the WAU definition last quarter — see Jira ticket DS-1047"
📊 Data Catalog
Table: events. No description. Owner: unknown. 4 columns documented.
💾 SQL File
SELECT user_id, COUNT(*) … WHERE event_type = 'session_start' — no comments
🧠 Senior Engineer
"Oh, the real definition uses 7-day rolling window, not calendar week. Talk to Priya."
🤖 AI Agent Result
Assembled 3 conflicting definitions. Chose one arbitrarily. Fast, confident, wrong.
The Tribal Knowledge Tax
Critical context lives in senior engineers' heads — metric edge cases, join caveats, deprecated fields. OKF's LLM-maintainable format means agents can help write and update documentation, keeping it current at near-zero cost.
Wiki Abandonment Cycle
Wikis get created. Wikis drift. Wikis get abandoned. OKF's core insight (from Karpathy's LLM Wiki): LLMs don't get bored, don't forget cross-references, and can update 15 files in one pass. The maintenance barrier disappears.
The Knowledge Gap in Agents
Models → reasoning power. MCP tools → execution power. OKF → enterprise judgment. Without the third layer, agents extrapolate from general training data and get org-specific answers wrong. Fast, confident, wrong.
Concept Document Anatomy
Every non-reserved .md file in an OKF bundle is a "concept document" — click annotations to learn each section
tables/orders.md OKF Concept Document — BigQuery Table
--- type: BigQuery Table title: Orders description: One row per completed customer order. resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders tags: [sales, revenue, core] timestamp: 2026-05-28T14:30:00Z --- # Schema | Column | Type | Description | |--------------|--------|----------------------------------------------| | order_id | STRING | Globally unique order identifier. | | customer_id | STRING | FK to [customers](/tables/customers.md). | | order_date | DATE | UTC date the order was placed. | | total_usd | FLOAT | Pre-tax order total in US dollars. | # Joins Joined with [customers](/tables/customers.md) on `customer_id`. # Citations [1] [BigQuery export schema](https://support.google.com/...) [2] [Revenue definitions handbook](/metrics/revenue.md)
YAML Frontmatter
type is the only required field. All other keys (title, description, resource, tags, timestamp) are optional conventions. Unknown keys must be preserved by consumers.
Schema Section
Optional # Schema heading. Recommended Markdown table with Column / Type / Description. Cross-links in descriptions build the knowledge graph.
Joins Section
Optional # Joins heading. Markdown links to related concept documents. Link semantics (FK, depends-on, etc.) are conveyed by prose, not link syntax.
Citations Section
Optional # Citations heading. Numbered list of external sources backing claims. Helps agents verify and trace provenance of documented knowledge.
Conformance Rules
✓ Every .md has parseable YAML frontmatter
✓ Every frontmatter has a non-empty type field
✓ Reserved filenames (index.md, log.md) follow their structures

Consumers MUST NOT reject bundles with:
missing optional fields, unknown type values, unknown frontmatter keys, broken cross-links, missing index.md
Reserved Filenames
index.md — directory listing, no frontmatter, format: * [Title](url) - description

log.md — chronological change history, ISO 8601 date headings (YYYY-MM-DD), newest-first, prose entries with bold prefixes: **Update**, **Creation**, **Deprecation**
Bundle Structure & Knowledge Graph
An OKF bundle is a directory tree — distributable as a git repo, tarball, zip, or cloud storage prefix
Typical Bundle Layout
analytics-bundle/
├── index.md           # root listing (reserved)
├── log.md             # change history (reserved)
├── datasets/
│   ├── index.md       # section listing
│   └── sales_db.md    # type: Dataset
├── tables/
│   ├── index.md
│   ├── orders.md      # type: BigQuery Table
│   └── customers.md   # type: BigQuery Table
└── metrics/
    ├── index.md
    └── wau.md         # type: Metric
Knowledge Graph (Cross-Links)
Markdown links between concept documents form a directed knowledge graph. Hover nodes to see connections.
Progressive Disclosure Navigation
An agent managing token budget navigates the bundle progressively: root index.md → finds metrics/ → loads metrics/index.md → loads metrics/wau.md. Only the 3 files needed are ever read. No bulk-loading required. This makes OKF naturally token-efficient for agentic workflows.
Where OKF Fits
The three-layer model for AI agents — OKF fills the knowledge layer that most teams neglect
🧠
Layer 3 — Reasoning
Foundation models (GPT-4, Claude, Gemini)
Gives agents thinking ability
Layer 2 — Execution
MCP tools, APIs, SQL engines, code interpreters
Gives agents action ability
📚
Layer 1 — Knowledge
OKF fills this layer — curated organizational context
Gives agents enterprise judgment
OKF vs. Adjacent Standards
OKF + MCP = Knowledge + Tools
MCP gives agents live access to tools at runtime. OKF gives agents curated context about what those tools mean. Together: agents act correctly (MCP) on the right understanding (OKF). Neither replaces the other.
OKF vs. RAG
RAG derives knowledge from raw document chunks at query time. OKF stores curated, version-controlled concepts. OKF is the curation layer that makes RAG more accurate — explicit definitions replace ambiguous embeddings.
CLAUDE.md / AGENTS.md hierarchy
CLAUDE.md / AGENTS.md = project-level agent memory (60k+ repos). OKF = org-wide knowledge graphs. MCP = real-time tool integrations. OKF sits between the per-project and the live-tool layers.
Reference Implementations (Apache 2.0)
1. BigQuery Enrichment Agent — 2-pass LLM pipeline: Pass 1 drafts OKF documents for every table from BigQuery metadata; Pass 2 crawls authoritative docs to add citations and join paths.

2. Static HTML Visualizer (viz.html) — Cytoscape.js knowledge graph viewer. Self-contained file, no backend, data never leaves the browser.

3. CLI tool (kcmd) — TypeScript, bidirectional sync between local OKF files and Google Cloud Knowledge Catalog.
The Open-Core Strategy
OKF gives away the cheap part — a file format any text editor can open. Simultaneously, Google Cloud updated Cloud Knowledge Catalog to ingest OKF bundles and serve them to Vertex AI agents automatically.

This is the classic open-format / managed-service strategy: format is free and portable; enterprise infrastructure (audit trails, RBAC, automated refresh, agent integration) lives in the managed service.
Standards Landscape
OKF deliberately inverts the complexity of prior standards — click each to compare
vs DCAT
vs Frictionless
vs Data Contracts
vs Delta Sharing
vs MCP
Live OKF Document Builder
Fill in the fields below — a conformant OKF concept document is generated in real-time. Watch the conformance status change as you add the required type field.
type *
The only required field. Producers choose their own type names.
title
description
resource
tags
schema rows
✖ Not conformant — add type field
Fill in the type field to generate a conformant OKF document.
Practice Exercises
Three browser exercises + one live lab to solidify your OKF understanding
1  OKF Conformance Checker
Below are 4 concept document frontmatter snippets. For each one, decide whether it is conformant or non-conformant with OKF v0.1 rules, then explain why.
2  Layer Matcher — OKF, MCP, RAG, or Data Contract?
For each scenario, select whether OKF, MCP, RAG, or a Data Contract is the primary solution.
3  OKF Structure Quiz
Test your understanding of OKF's reserved filenames, conformance rules, and versioning.
  Live Lab — OKF Document Generator
Describe a database table (name, purpose, main columns) in plain text. gpt-4o-mini generates a complete, conformant OKF concept document for it — including proper frontmatter, schema table, join hints, and citations. Optional OpenAI key (~$0.001/run).
Describe your table (pre-filled with an example)
Related Posts
Complete your AI systems knowledge stack
← Previous Post
Post 56 — Multi-Vector Embeddings
All Posts →
Visual Summary