Institutional Memory
Getting Expert Knowledge Into LLM Agents
LLMs are trained on public internet data — but the knowledge that makes organizations run lives in human heads. This post covers how to extract tacit expert knowledge, structure it, and feed it to LLM agents — from classical elicitation theory to cutting-edge LLM interviewer systems.
The Knowledge That LLMs Will Never See
Training data covers what humans published. It doesn't cover what they never wrote down.
An LLM trained on the internet knows an enormous amount about public knowledge — papers, books, documentation, StackOverflow. But consider what it cannot know: the specific heuristics your best sales rep uses to read a prospect's hesitation, the undocumented design decisions your senior engineer has learned to avoid after a decade of production incidents, or the institutional norms your most tenured manager knows are non-negotiable even though no policy doc says so.
This gap has a name: Polanyi's Paradox. Philosopher Michael Polanyi (1966) observed: "We know more than we can tell." The knowledge embedded in expertise is largely tacit — procedural, contextual, hard to articulate, and never written down. And that's exactly the knowledge organizations most desperately need their AI agents to have.
The map of what an LLM agent can and cannot know has a clear gap at the top:
Encoded in weights at training time. No retrieval needed.
Tacit vs. Explicit Knowledge
Not all undocumented knowledge is equally hard to capture. The spectrum from fully tacit to fully explicit determines which approach you need.
Knowledge management theory distinguishes two poles. Explicit knowledge is codified — it can be written, stored, and transferred as documents. Tacit knowledge is embodied — it lives in practice, intuition, and experience, and resists verbalization. Most real expert knowledge sits somewhere in between.
The philosopher Gilbert Ryle (1949) distinguished "knowing that" (explicit facts) from "knowing how" (skilled practice). An expert surgeon "knows that" the radial artery is at the wrist, but also "knows how" to feel when a suture is correct — the latter is tacit and cannot easily be transferred via text.
The SECI Model — Updated for LLMs
Nonaka & Takeuchi's 1995 SECI model describes how tacit knowledge becomes explicit. Three research groups have now extended it specifically for generative AI.
The SECI model (Socialization → Externalization → Combination → Internalization) describes the cycle by which organizations create and transfer knowledge. Originally it described human-to-human knowledge transfer. In 2024–2026, three independent research groups proposed extensions specifically for LLM-augmented knowledge management.
GRAI Framework (2025, VINE Journal) — Splits each SECI quadrant into 8 fields distinguishing human perspective vs. machine perspective. "Receptive" AI that learns from interaction, not just generation.
GenAI SECI (2026, AHFE) — Introduces "Digital Fragmented Knowledge" — new concept integrating explicit and tacit knowledge within cyberspace. Full architectural spec for LLM-augmented SECI implementation.
The Expert-to-Agent Knowledge Pipeline
Five stages connect what's in an expert's head to what an LLM agent can use. Each stage has distinct tools and failure modes.
There is no single magic step. Getting tacit expert knowledge into an agent memory system requires a pipeline of five distinct phases, each with its own tooling, verification challenges, and failure modes. Click each stage to understand what's required.
This is where the knowledge lives — and why this problem is hard. The expert's knowledge is implicit, contextual, and procedural. They "know how" to do things without being able to articulate the underlying rules. It includes:
- Decision heuristics — "I never trust a customer who asks about refund policy before asking about features"
- Pattern recognition — sensing when a patient's chart "looks wrong" even without pinpointing why
- Workarounds — knowing that the official process breaks down in edge case X and what to do instead
- Undocumented constraints — institutional rules-of-thumb never written into any policy
- Failure modes to avoid — hard-won lessons from past mistakes that never made it into post-mortems
The core challenge: the expert often doesn't know what they know. Asking "what do you do?" rarely surfaces the most valuable knowledge — you need structured elicitation techniques to draw it out.
Elicitation is the act of externalization — making tacit knowledge articulable. Three major approaches, covered in detail in the Capture Methods section:
- Conversational Interview Agents — LLM conducts a structured interview with the expert, asking probing follow-up questions. Zuin et al. (2025) showed 94.9% recall via a self-critical MDP-style agent loop.
- Think-Aloud Protocol — expert verbalizes their reasoning while performing a task. Classic technique from cognitive psychology, now adapted as a prompting framework (THiNK, 2025).
- Critical Decision Method (CDM) — retrospective interview technique: "Tell me about a time when you faced a difficult decision." Probes for the cues, options, and rules the expert used. High yield for tacit decision knowledge.
- Cognitive Task Analysis — structured decomposition of a complex task into sub-goals, cues, and decision points. Labor-intensive but produces highly actionable knowledge structures.
Failure mode: Experts articulate rationalized post-hoc stories about their decisions, not actual decision processes. Multiple sessions, case probing, and hypothetical scenarios help overcome this.
Raw interview transcripts are not directly useful to an agent. They need to be transformed into a form the agent can retrieve and reason over. Three main structuring paradigms:
- Knowledge Graphs (KGs) — entities and relationships extracted from interview transcripts. LLM-empowered KG construction (Bian, 2025) automates this. Microsoft GraphRAG builds hierarchical clusters from private corpora, enabling "whole-dataset" queries vector search cannot answer.
- BPMN Process Diagrams — Radhakrishnan (2025) showed Gemini 2.5 Pro can convert an interview conversation into a BPMN 2.0 process diagram in ~12 minutes. Useful for procedural tacit knowledge.
- Atomic Knowledge Units (AKUs) — Bakal (2026) proposes structured, governance-aware units encoding: what to do, which tools to use, constraints to respect, and where to go next. AKUs form a composable knowledge graph agents traverse at runtime.
- Fine-tuning datasets — elicited knowledge converted into (instruction, response) pairs for domain fine-tuning. High cost, high benefit for deeply specialized agents.
This is the most critical and most neglected stage. Multiple papers (Kuks et al. 2025, Shaposhnyk et al. 2025) report that LLM interviewers can hallucinate or fail to catch false information inserted by the expert. Verification is non-negotiable before feeding captured knowledge into an agent.
- Expert sign-off — the original expert reads the structured output and confirms accuracy. Simple, effective, but creates a bottleneck and scheduling burden.
- Dual-LLM validation — Shaposhnyk et al. use GPT-4o and Claude as mutual validators: each proposes causal relationships; the other verifies. 10 of 12 proposed relationships confirmed by both models, but hallucination risk remains.
- Adversarial insertion testing — Kuks et al. intentionally inserted false statements to test detection. Finding: the LLM interviewer did NOT reliably filter them. Human expert review before deployment is mandatory.
- Scope constraints — informed consent, IP rights, and data erasure rights must be established before capturing expert knowledge. Expert Mind (Cervera 2026) builds this into its core design constraints.
Key insight: The output should be treated like a draft, not ground truth. Build a human-in-the-loop review step into every pipeline before the knowledge enters the agent's memory.
Once verified, the structured knowledge is loaded into the agent's memory system. The memory form determines how it's accessed:
- Textual RAG — knowledge stored as text chunks, retrieved by cosine similarity at query time. Cheapest to implement; works well for loosely structured knowledge.
- GraphRAG — knowledge stored as a knowledge graph with LLM-generated summaries. Enables multi-hop reasoning and whole-dataset queries. Microsoft GraphRAG (2024) is open-source.
- Fine-tuning — knowledge baked into model weights. Best for stable, high-frequency knowledge (domain terminology, core procedures). Risk: catastrophic forgetting.
- Knowledge Editing — targeted weight updates for specific facts. Best for corrections and small-scope updates without full retraining.
- Hybrid — the Nurture-First Agent (Zhang 2026) uses a Three-Layer Cognitive Architecture that separates knowledge by volatility and personalization level, combining retrieval and parametric approaches.
Three Capture Methods
How you extract tacit knowledge depends on its type, the expert's availability, and how structured the output needs to be.
• Kuks, Finkel, Wurster (2025) — Industry 4.0 Science — Semi-structured interviews on 9 topics via ChatGPT-5 Voice Mode; completeness 2.89/4.0; hallucination risk identified.
• Rank et al. (2025) — Industry 5.0 literature review — Reviews LLM conversational agents for tacit KE in manufacturing. Highlights operator acceptance as a critical challenge.
• PU-ADKA (Wu et al., 2025) — EMNLP — Selectively queries the most appropriate expert from a team based on availability and knowledge boundaries; validated on drug development datasets. arXiv:2508.17202
• THiNK (2025) — arXiv:2505.20184 — Applies think-aloud protocol as an LLM prompting framework, having the model articulate thought processes in real time, bridging cognitive science and LLM prompting.
• Freire, Wellsandt et al. (PMC 2024) — RAG system over factory documentation + worker knowledge; GPT-4 achieved 97.5% factuality; workers preferred asking colleagues over AI, raising adoption design challenges.
• Data Therapist (Shin et al. 2025) — arXiv:2505.00455 — Mixed-initiative Q&A + interactive annotation at multiple granularity levels; elicits tacit knowledge about data provenance from accounting, political science, and CS security experts.
• AutoElicit (Capstick et al. 2024) — arXiv:2411.17284 — LLMs elicit Bayesian prior distributions from domain experts; 6 months of labeling effort saved on UTI prediction from sensor data.
• Shaposhnyk et al. (2025) — arXiv:2504.10397 — Dual GPT-4o + Claude mutual validation for causal network elicitation; 10/12 relationships confirmed by both models.
• Knowledge Activation / AKUs (Bakal 2026) — arXiv:2603.14805 — Atomic Knowledge Units as composable institutional knowledge primitives for agentic software development.
• LLM-empowered KG Construction Survey (Bian 2025) — arXiv:2510.20345 — Schema-based vs. schema-free approaches; reviews Ontogenia, CQbyCQ, EDC, AutoSchemaKG.
Enterprise RAG Evolution
Even before capturing tacit knowledge, organizations have large amounts of explicit but private knowledge. How RAG handles this has evolved significantly.
Organizations have vast stores of already-documented private knowledge — wikis, Confluence pages, email threads, meeting notes, code comments, Slack conversations. The challenge here is not elicitation but retrieval: how do you build a system that lets an agent query across all of this coherently? The state of the art has evolved through four generations.
Naive RAG — chunk documents → embed chunks → store in FAISS vector store → retrieve top-K by cosine similarity → inject into prompt. The baseline used by 80.5% of enterprise RAG implementations (MDPI Systematic Review, 2024).
Advanced RAG — adds query rewriting, hybrid search (BM25 + dense), re-ranking, and hierarchical chunk structures. Addresses the most common failure modes of naive RAG.
GraphRAG — Microsoft Research (2024). LLM builds a knowledge graph from private documents. Bottom-up hierarchical clustering creates semantic communities and pre-computed summaries. Enables "whole-dataset" queries vector search cannot answer.
Agentic KG-RAG — 2025+ frontier. Agents actively manage and expand the knowledge graph, sense knowledge gaps, proactively query experts, and maintain a living knowledge base. The convergence of the capture pipeline and the retrieval system.
Challenges & Failure Modes
Building institutional memory pipelines is harder than it looks. Click each challenge to understand the failure mode and mitigation.
Mitigations: Use think-aloud during actual task performance (not retrospective interview). Ask about exceptions ("When does your usual approach fail?"). Present hypothetical cases. Cross-validate with behavioral data (call recordings, decision logs) to spot inconsistencies with the stated rules.
Mitigations: Mandatory human expert sign-off on all structured outputs before loading into agent memory. Dual-LLM validation (GPT-4o + Claude cross-checking each other's extractions). Never treat the LLM's structured output as ground truth — treat it as a draft for expert review.
Mitigations: Establish knowledge contribution agreements before any elicitation begins. Build erasure support into the storage architecture (versioned knowledge bases). Store knowledge attributions so you know which entries came from which expert. Involve legal counsel early.
Mitigations: Attach expiry dates or review triggers to knowledge entries. Build a continuous refresh loop — periodic re-interviews or automated monitoring of domain signals (regulatory changes, new research). Distinguish stable architectural knowledge from volatile operational knowledge and version them separately.
Mitigations: Frame knowledge capture as legacy preservation, not replacement. Give experts visibility and control over what's stored and how it's used. Make the interface low-friction — voice-based, integrated into existing workflows. Show concrete value back to the expert (e.g., the agent can now handle repetitive questions, freeing them for high-value work).
Mitigations: Tag knowledge entries with source expert, context, and recency. Build contradiction detection into the structuring stage (LLM flags when a new entry conflicts with an existing one). For high-stakes contradictions, escalate to a committee review rather than automatically resolving. Present the agent with the disagreement explicitly: "Expert A says X, Expert B says Y — apply both rules and flag for human review."