Post 51 — Institutional Memory & LLM Agents
Enter the access code to continue
Post 51 · Agents & Systems

Institutional Memory
Getting Expert Knowledge Into LLM Agents

LLMs are trained on public internet data — but the knowledge that makes organizations run lives in human heads. This post covers how to extract tacit expert knowledge, structure it, and feed it to LLM agents — from classical elicitation theory to cutting-edge LLM interviewer systems.

Based on: 30+ papers & reports (2023–2026)
Category: Agents & Systems
Level: Intermediate
Post: 51 of 51

The Knowledge That LLMs Will Never See

Training data covers what humans published. It doesn't cover what they never wrote down.

An LLM trained on the internet knows an enormous amount about public knowledge — papers, books, documentation, StackOverflow. But consider what it cannot know: the specific heuristics your best sales rep uses to read a prospect's hesitation, the undocumented design decisions your senior engineer has learned to avoid after a decade of production incidents, or the institutional norms your most tenured manager knows are non-negotiable even though no policy doc says so.

This gap has a name: Polanyi's Paradox. Philosopher Michael Polanyi (1966) observed: "We know more than we can tell." The knowledge embedded in expertise is largely tacit — procedural, contextual, hard to articulate, and never written down. And that's exactly the knowledge organizations most desperately need their AI agents to have.

Why This Is Getting Urgent Now
The retirement crisis is converging with the AI adoption wave. ~10,000 Baby Boomers retire daily (APQC). Fortune 500 companies lose an estimated $31.5 billion annually from knowledge loss due to employee turnover. Manufacturers will need 3.8 million new workers by 2033 with 1.9 million positions potentially unfilled. That tacit knowledge — oral traditions, undocumented meeting decisions, informal rules-of-thumb — walks out the door with each retiree.
10,000
Baby Boomers retire daily (US)
$31.5B
Annual Fortune 500 knowledge loss from turnover
95%
Enterprise AI pilots that fail to scale due to LLM statelessness
94.9%
Full-knowledge recall by LLM interviewer agent (Zuin et al. 2025)

The map of what an LLM agent can and cannot know has a clear gap at the top:

What LLMs Already Know
Public internet: Wikipedia, ArXiv, StackOverflow, GitHub, news, books, documentation.

Encoded in weights at training time. No retrieval needed.
What LLMs Cannot Know
Private org knowledge: internal wikis, Slack, meeting notes, undocumented processes, and most critically — the tacit knowledge that was never written down.

Tacit vs. Explicit Knowledge

Not all undocumented knowledge is equally hard to capture. The spectrum from fully tacit to fully explicit determines which approach you need.

Knowledge management theory distinguishes two poles. Explicit knowledge is codified — it can be written, stored, and transferred as documents. Tacit knowledge is embodied — it lives in practice, intuition, and experience, and resists verbalization. Most real expert knowledge sits somewhere in between.

The philosopher Gilbert Ryle (1949) distinguished "knowing that" (explicit facts) from "knowing how" (skilled practice). An expert surgeon "knows that" the radial artery is at the wrist, but also "knows how" to feel when a suture is correct — the latter is tacit and cannot easily be transferred via text.

Drag the slider to explore the knowledge spectrum:
Fully Tacit Semi-Tacit Mixed Semi-Explicit Fully Explicit
Loading...
Loading...
Loading...
Polanyi's Paradox — The AI Angle
LLMs may actually be uniquely positioned to crack Polanyi's Paradox. Where traditional interviews force experts to articulate knowledge in one shot, an LLM interviewer can ask follow-up questions, probe inconsistencies, generate hypotheticals ("what would you do if X?"), and over many interactions reconstruct the underlying model the expert is using — even when the expert couldn't state it directly. A 2025 IJCNN paper (Zuin et al.) showed an LLM agent achieved 94.9% full-knowledge recall across a synthetic company without ever contacting the original specialist directly — by interviewing downstream employees who had absorbed fragments of the expert's knowledge.

The SECI Model — Updated for LLMs

Nonaka & Takeuchi's 1995 SECI model describes how tacit knowledge becomes explicit. Three research groups have now extended it specifically for generative AI.

The SECI model (Socialization → Externalization → Combination → Internalization) describes the cycle by which organizations create and transfer knowledge. Originally it described human-to-human knowledge transfer. In 2024–2026, three independent research groups proposed extensions specifically for LLM-augmented knowledge management.

↑ Tacit → Explicit (vertical) Individual → Collective (horizontal) →
S — Top Left
Socialization
Tacit → Tacit · Individual
Expert apprenticeship, observation, shadowing. Knowledge shared through shared experience, not words.
E — Top Right
Externalization
Tacit → Explicit · Collective
Writing down processes, creating SOPs, documenting decisions. The hardest and most valuable step.
I — Bottom Left
Internalization
Explicit → Tacit · Individual
Learning by doing with documented knowledge. Reading SOPs and developing intuition through practice.
C — Bottom Right
Combination
Explicit → Explicit · Collective
Combining documents, reports, databases. Merging and recombining explicit knowledge stores.
Three LLM Extensions to SECI
HAC-SECI (2024, Tokyo University of Science) — Dual-loop: Inner Loop = humans give knowledge to AI (agent growth); Outer Loop = AI's accumulated knowledge helps humans recognize and develop their own thinking.

GRAI Framework (2025, VINE Journal) — Splits each SECI quadrant into 8 fields distinguishing human perspective vs. machine perspective. "Receptive" AI that learns from interaction, not just generation.

GenAI SECI (2026, AHFE) — Introduces "Digital Fragmented Knowledge" — new concept integrating explicit and tacit knowledge within cyberspace. Full architectural spec for LLM-augmented SECI implementation.

The Expert-to-Agent Knowledge Pipeline

Five stages connect what's in an expert's head to what an LLM agent can use. Each stage has distinct tools and failure modes.

There is no single magic step. Getting tacit expert knowledge into an agent memory system requires a pipeline of five distinct phases, each with its own tooling, verification challenges, and failure modes. Click each stage to understand what's required.

🧠
Expert Knowledge
In the brain
🎙️
Elicitation
Interviews / think-aloud
🔧
Structuring
KGs / BPMN / AKUs
Verification
Expert sign-off
💾
Agent Memory
RAG / fine-tune / KG
Stage 0 · Expert Knowledge (In the Brain)

This is where the knowledge lives — and why this problem is hard. The expert's knowledge is implicit, contextual, and procedural. They "know how" to do things without being able to articulate the underlying rules. It includes:

  • Decision heuristics — "I never trust a customer who asks about refund policy before asking about features"
  • Pattern recognition — sensing when a patient's chart "looks wrong" even without pinpointing why
  • Workarounds — knowing that the official process breaks down in edge case X and what to do instead
  • Undocumented constraints — institutional rules-of-thumb never written into any policy
  • Failure modes to avoid — hard-won lessons from past mistakes that never made it into post-mortems

The core challenge: the expert often doesn't know what they know. Asking "what do you do?" rarely surfaces the most valuable knowledge — you need structured elicitation techniques to draw it out.

Stage 1 · Elicitation (Drawing It Out)

Elicitation is the act of externalization — making tacit knowledge articulable. Three major approaches, covered in detail in the Capture Methods section:

  • Conversational Interview Agents — LLM conducts a structured interview with the expert, asking probing follow-up questions. Zuin et al. (2025) showed 94.9% recall via a self-critical MDP-style agent loop.
  • Think-Aloud Protocol — expert verbalizes their reasoning while performing a task. Classic technique from cognitive psychology, now adapted as a prompting framework (THiNK, 2025).
  • Critical Decision Method (CDM) — retrospective interview technique: "Tell me about a time when you faced a difficult decision." Probes for the cues, options, and rules the expert used. High yield for tacit decision knowledge.
  • Cognitive Task Analysis — structured decomposition of a complex task into sub-goals, cues, and decision points. Labor-intensive but produces highly actionable knowledge structures.

Failure mode: Experts articulate rationalized post-hoc stories about their decisions, not actual decision processes. Multiple sessions, case probing, and hypothetical scenarios help overcome this.

Stage 2 · Structuring (Making It Usable)

Raw interview transcripts are not directly useful to an agent. They need to be transformed into a form the agent can retrieve and reason over. Three main structuring paradigms:

  • Knowledge Graphs (KGs) — entities and relationships extracted from interview transcripts. LLM-empowered KG construction (Bian, 2025) automates this. Microsoft GraphRAG builds hierarchical clusters from private corpora, enabling "whole-dataset" queries vector search cannot answer.
  • BPMN Process Diagrams — Radhakrishnan (2025) showed Gemini 2.5 Pro can convert an interview conversation into a BPMN 2.0 process diagram in ~12 minutes. Useful for procedural tacit knowledge.
  • Atomic Knowledge Units (AKUs) — Bakal (2026) proposes structured, governance-aware units encoding: what to do, which tools to use, constraints to respect, and where to go next. AKUs form a composable knowledge graph agents traverse at runtime.
  • Fine-tuning datasets — elicited knowledge converted into (instruction, response) pairs for domain fine-tuning. High cost, high benefit for deeply specialized agents.
Stage 3 · Verification (The Bottleneck)

This is the most critical and most neglected stage. Multiple papers (Kuks et al. 2025, Shaposhnyk et al. 2025) report that LLM interviewers can hallucinate or fail to catch false information inserted by the expert. Verification is non-negotiable before feeding captured knowledge into an agent.

  • Expert sign-off — the original expert reads the structured output and confirms accuracy. Simple, effective, but creates a bottleneck and scheduling burden.
  • Dual-LLM validation — Shaposhnyk et al. use GPT-4o and Claude as mutual validators: each proposes causal relationships; the other verifies. 10 of 12 proposed relationships confirmed by both models, but hallucination risk remains.
  • Adversarial insertion testing — Kuks et al. intentionally inserted false statements to test detection. Finding: the LLM interviewer did NOT reliably filter them. Human expert review before deployment is mandatory.
  • Scope constraints — informed consent, IP rights, and data erasure rights must be established before capturing expert knowledge. Expert Mind (Cervera 2026) builds this into its core design constraints.

Key insight: The output should be treated like a draft, not ground truth. Build a human-in-the-loop review step into every pipeline before the knowledge enters the agent's memory.

Stage 4 · Agent Memory (The Destination)

Once verified, the structured knowledge is loaded into the agent's memory system. The memory form determines how it's accessed:

  • Textual RAG — knowledge stored as text chunks, retrieved by cosine similarity at query time. Cheapest to implement; works well for loosely structured knowledge.
  • GraphRAG — knowledge stored as a knowledge graph with LLM-generated summaries. Enables multi-hop reasoning and whole-dataset queries. Microsoft GraphRAG (2024) is open-source.
  • Fine-tuning — knowledge baked into model weights. Best for stable, high-frequency knowledge (domain terminology, core procedures). Risk: catastrophic forgetting.
  • Knowledge Editing — targeted weight updates for specific facts. Best for corrections and small-scope updates without full retraining.
  • Hybrid — the Nurture-First Agent (Zhang 2026) uses a Three-Layer Cognitive Architecture that separates knowledge by volatility and personalization level, combining retrieval and parametric approaches.

Three Capture Methods

How you extract tacit knowledge depends on its type, the expert's availability, and how structured the output needs to be.

Method 1
Conversational Interview Agents
An LLM agent conducts a structured interview with the domain expert. The agent formulates targeted questions, updates its knowledge state after each answer, identifies gaps, and asks follow-up questions. The conversation continues until a completeness threshold is reached.
1
Greeting & scope-setting — agent introduces itself, explains the goal, and establishes what domain/task the interview will cover.
2
Gap-targeting questions — agent asks broad questions first ("How do you approach X?"), then drills down on gaps in its current knowledge model ("You mentioned Y — can you give an example of when that rule breaks down?").
3
Response integration — after each answer, agent updates its internal knowledge state (a running document or structured representation).
4
Self-critical evaluation — agent scores its own knowledge completeness (0–10), identifies remaining gaps, and decides whether to ask more or terminate. Zuin et al. model this as a Susceptible-Infectious (SI) epidemiological process with waning infectivity β(t) = β₀e−γt.
5
Knowledge synthesis — at termination, agent generates a structured summary of everything learned, tagged by topic, for storage in the knowledge base.
Key Papers
• Zuin et al. (2025) — IJCNN — LLM agent interviews employees across org hierarchy; 94.9% knowledge recall without contacting original specialist. arXiv:2507.03811
• Kuks, Finkel, Wurster (2025) — Industry 4.0 Science — Semi-structured interviews on 9 topics via ChatGPT-5 Voice Mode; completeness 2.89/4.0; hallucination risk identified.
• Rank et al. (2025) — Industry 5.0 literature review — Reviews LLM conversational agents for tacit KE in manufacturing. Highlights operator acceptance as a critical challenge.
• PU-ADKA (Wu et al., 2025) — EMNLP — Selectively queries the most appropriate expert from a team based on availability and knowledge boundaries; validated on drug development datasets. arXiv:2508.17202
Method 2
Think-Aloud & Reflective Prompting
The expert verbalizes their reasoning while performing a task. Instead of asking "what do you know?", this method captures knowledge in action — when the tacit knowledge is actually being used. An LLM can then analyze the verbalized stream to extract decision rules and heuristics.
1
Setup — expert is given a realistic task to perform (reviewing a case, making a diagnosis, debugging code). They narrate their thought process out loud in real time.
2
Verbalization capture — audio transcribed and cleaned. Each verbalized thought tagged with the action it accompanied ("noticed the patient's potassium was low → immediately flagged cardiac risk").
3
LLM rule extraction — LLM processes the tagged transcript to extract: cues (what the expert noticed), decisions (what they did), and rules (the general principle behind the decision).
4
Reflective prompting loop — LLM surfaces extracted rules back to the expert ("It looks like you always check X before doing Y — is that correct?"). Expert confirms, corrects, or adds nuance.
5
Visualization feedback — Freire et al. (CHI 2023) found that showing experts visualizations of their own data while they narrate significantly improves the depth of knowledge surfaced — the visual prompts recall of otherwise-unspoken knowledge.
Key Papers
• Freire, Wang et al. (CHI 2023) — Intelligent assistant elicits tacit knowledge from manufacturing workers via voice + data visualization; first HCI work combining LLMs with reflective prompting for tacit KE. ACM DL
• THiNK (2025) — arXiv:2505.20184 — Applies think-aloud protocol as an LLM prompting framework, having the model articulate thought processes in real time, bridging cognitive science and LLM prompting.
• Freire, Wellsandt et al. (PMC 2024) — RAG system over factory documentation + worker knowledge; GPT-4 achieved 97.5% factuality; workers preferred asking colleagues over AI, raising adoption design challenges.
• Data Therapist (Shin et al. 2025) — arXiv:2505.00455 — Mixed-initiative Q&A + interactive annotation at multiple granularity levels; elicits tacit knowledge about data provenance from accounting, political science, and CS security experts.
Method 3
Structured Elicitation into Formal Artifacts
Rather than free-form interviews, this approach elicits knowledge directly into structured outputs — process diagrams, knowledge graphs, Bayesian priors, or codified rule systems. The structure is defined in advance; the LLM fills it in through targeted questioning.
1
Choose an artifact type — BPMN (business process), ontology (taxonomy of concepts), Bayesian network (causal relationships), or Atomic Knowledge Units (decision rules + tool bindings).
2
Seed the structure — LLM proposes an initial skeleton based on the domain ("Here's a draft process diagram for handling a customer complaint — does this match how you do it?").
3
Iterative gap-filling — expert reviews each node/edge/rule and either confirms or corrects. LLM asks targeted questions for each gap ("What happens if the customer escalates at step 3?").
4
Dual-model validation — a second LLM independently reviews the final artifact, flags inconsistencies, and proposes clarifications. Shaposhnyk et al. (2025) showed GPT-4o and Claude acting as mutual validators reduced logical inconsistencies vs. expert-only elicitation.
5
Artifact storage — the final formal artifact (BPMN XML, RDF knowledge graph, JSON AKUs) becomes a machine-readable knowledge base the agent can query precisely.
Key Papers
• Radhakrishnan (2025) — arXiv:2512.05122 — Gemini 2.5 Pro converts interview dialogue into BPMN 2.0 process diagrams in ~12 minutes. Validated on SME manufacturing processes.
• AutoElicit (Capstick et al. 2024) — arXiv:2411.17284 — LLMs elicit Bayesian prior distributions from domain experts; 6 months of labeling effort saved on UTI prediction from sensor data.
• Shaposhnyk et al. (2025) — arXiv:2504.10397 — Dual GPT-4o + Claude mutual validation for causal network elicitation; 10/12 relationships confirmed by both models.
• Knowledge Activation / AKUs (Bakal 2026) — arXiv:2603.14805 — Atomic Knowledge Units as composable institutional knowledge primitives for agentic software development.
• LLM-empowered KG Construction Survey (Bian 2025) — arXiv:2510.20345 — Schema-based vs. schema-free approaches; reviews Ontogenia, CQbyCQ, EDC, AutoSchemaKG.

Enterprise RAG Evolution

Even before capturing tacit knowledge, organizations have large amounts of explicit but private knowledge. How RAG handles this has evolved significantly.

Organizations have vast stores of already-documented private knowledge — wikis, Confluence pages, email threads, meeting notes, code comments, Slack conversations. The challenge here is not elicitation but retrieval: how do you build a system that lets an agent query across all of this coherently? The state of the art has evolved through four generations.

Naive RAG — chunk documents → embed chunks → store in FAISS vector store → retrieve top-K by cosine similarity → inject into prompt. The baseline used by 80.5% of enterprise RAG implementations (MDPI Systematic Review, 2024).

Strength
Simple to implement. Works well for factual lookups from structured docs.
Weakness
Can't answer "whole-dataset" queries. No understanding of relationships between documents. Misses context scattered across many chunks.
Tacit Knowledge Gap
Completely fails for undocumented knowledge — there's nothing to chunk and embed if it was never written down.
Stack
LangChain + FAISS or Elasticsearch. GPT-based models used in 63.6% of enterprise implementations.

Advanced RAG — adds query rewriting, hybrid search (BM25 + dense), re-ranking, and hierarchical chunk structures. Addresses the most common failure modes of naive RAG.

Key Additions
Query expansion, parent-child chunking, re-ranking (cross-encoder), MMR diversity, hypothetical document embeddings (HyDE).
Weakness
Still fundamentally text-matching. Multi-hop reasoning across documents (A refers to B which modifies C) is unreliable.
Tacit Knowledge Gap
Still only works on documented knowledge. Better at finding the right chunk but cannot synthesize undocumented expertise.
Best For
Large, heterogeneous document collections (support KB, legal docs, technical manuals).

GraphRAG — Microsoft Research (2024). LLM builds a knowledge graph from private documents. Bottom-up hierarchical clustering creates semantic communities and pre-computed summaries. Enables "whole-dataset" queries vector search cannot answer.

Key Innovation
Graph structure captures entity relationships across documents. Pre-summaries at each cluster level allow macro-level queries ("What are the 5 main themes in our incident database?").
Strengths
Source provenance for every assertion. Outperforms baseline RAG on comprehensiveness, diversity, and empowerment. Open-source on GitHub.
Tacit Knowledge Bridge
If captured interview transcripts are fed in, GraphRAG can extract entity relationships automatically — partially automating the Structuring stage of the pipeline.
Cost
High upfront indexing cost. LLM API calls for every entity and relationship extraction. Best for large, stable knowledge bases.

Agentic KG-RAG — 2025+ frontier. Agents actively manage and expand the knowledge graph, sense knowledge gaps, proactively query experts, and maintain a living knowledge base. The convergence of the capture pipeline and the retrieval system.

New Capabilities
Proactive gap detection ("We don't have documentation for the escalation process"). Automatic scheduling of expert interviews. Continuous ingestion from Slack, email, meeting transcripts.
AKU Architecture (Bakal 2026)
Atomic Knowledge Units form a composable graph agents traverse at runtime. Eliminates manual context reconstruction. Compresses onboarding time and reduces cross-team friction.
Nurture-First Agents (Zhang 2026)
Three-Layer Cognitive Architecture separates knowledge by volatility + personalization. Knowledge Crystallization Cycle consolidates dialogue into structured reusable assets.
Expert Mind (Cervera 2026)
Structured interviews + think-aloud sessions → vector store → conversational query interface. Targets knowledge loss when SMEs retire. Includes consent and IP rights framework.

Challenges & Failure Modes

Building institutional memory pipelines is harder than it looks. Click each challenge to understand the failure mode and mitigation.

🎭
The Rationalization Problem — Experts Describe Ideal Behavior, Not Actual Behavior
When asked "how do you do X?", experts generate a plausible, logically coherent narrative — which may not match what they actually do. This is Nisbett & Wilson's (1977) "verbal overshadowing" problem. An expert radiologist may say "I check for density differences" but actually pattern-match subconsciously in ways they cannot articulate.

Mitigations: Use think-aloud during actual task performance (not retrospective interview). Ask about exceptions ("When does your usual approach fail?"). Present hypothetical cases. Cross-validate with behavioral data (call recordings, decision logs) to spot inconsistencies with the stated rules.
🌀
Hallucination During Capture — LLM Fills Gaps with Plausible Fictions
An LLM interviewer will confabulate when the expert's answer is ambiguous or incomplete. More dangerously, it may fail to flag false information the expert deliberately inserts. Kuks et al. (2025) tested this explicitly: the LLM-interviewer did NOT reliably detect intentionally false statements. The structured output looks confident regardless of accuracy.

Mitigations: Mandatory human expert sign-off on all structured outputs before loading into agent memory. Dual-LLM validation (GPT-4o + Claude cross-checking each other's extractions). Never treat the LLM's structured output as ground truth — treat it as a draft for expert review.
🔒
IP Rights, Consent, and Data Governance
Who owns captured expert knowledge? The expert? The organization? If an expert leaves and requests erasure of their knowledge contributions, what does the organization do? This is an unsolved legal and governance question. Expert Mind (Cervera 2026) identifies informed consent, IP rights, and data erasure rights as core design constraints that most systems ignore.

Mitigations: Establish knowledge contribution agreements before any elicitation begins. Build erasure support into the storage architecture (versioned knowledge bases). Store knowledge attributions so you know which entries came from which expert. Involve legal counsel early.
📉
Knowledge Decay — Captured Knowledge Goes Stale
Expert knowledge captured today may be wrong next year. Regulations change, best practices evolve, and the expert themselves updates their views. A static knowledge base becomes increasingly stale and potentially dangerous if agents rely on it for high-stakes decisions. Zuin et al. model this as an epidemiological SI process with waning infectivity β(t) = β₀e−γt — knowledge "infectiousness" decays over time without reinforcement.

Mitigations: Attach expiry dates or review triggers to knowledge entries. Build a continuous refresh loop — periodic re-interviews or automated monitoring of domain signals (regulatory changes, new research). Distinguish stable architectural knowledge from volatile operational knowledge and version them separately.
🤝
Expert Adoption Resistance
Experts may resist knowledge capture for multiple reasons: fear that capturing their knowledge makes them replaceable, distrust of AI systems, discomfort with having their decision-making scrutinized, or simply the time burden. Rank et al. (Industry 5.0, 2025) identify operator acceptance as a critical challenge for LLM-based KE in manufacturing. Freire et al. (PMC 2024) found that factory workers preferred asking human colleagues over AI even when both were available.

Mitigations: Frame knowledge capture as legacy preservation, not replacement. Give experts visibility and control over what's stored and how it's used. Make the interface low-friction — voice-based, integrated into existing workflows. Show concrete value back to the expert (e.g., the agent can now handle repetitive questions, freeing them for high-value work).
🌊
Distributed, Contradictory Knowledge Across Multiple Experts
In a real organization, different experts hold different (and sometimes contradictory) views on the same question. Expert A says "always escalate after 3 calls"; Expert B says "only escalate if the issue is technical." Both are drawing on genuine experience. Whose view should the agent follow? The PU-ADKA paper (Wu et al. 2025) addresses expert selection — choosing the right expert for a specific query — but the contradiction resolution problem remains open.

Mitigations: Tag knowledge entries with source expert, context, and recency. Build contradiction detection into the structuring stage (LLM flags when a new entry conflicts with an existing one). For high-stakes contradictions, escalate to a committee review rather than automatically resolving. Present the agent with the disagreement explicitly: "Expert A says X, Expert B says Y — apply both rules and flag for human review."