Pure neural networks can't reason. Pure symbolic systems can't perceive. Neither alone is enough for real-world AI.
Deep Learning
Perception champion
+
but
Symbolic AI
Reasoning champion
+
but
NeSy
Best of both
Click each panel to see where each approach fails
Neural networks: great at patterns, poor at logic
Given a logical chain — "All birds fly. Tweety is a bird. Does Tweety fly?" — a neural network learns statistical patterns but cannot guarantee the correct logical inference. It might answer "yes" for common birds but fail for penguins, which violates the rule. It has no explicit representation of the rule itself.
What Neural Does Well
Pattern recognition · Image classification · Speech synthesis ·
Language generation · Statistical inference from large datasets.
Trained end-to-end on data. No hand-crafted rules needed.
What Symbolic Does Well
Logical deduction · Constraint satisfaction · Explainable decisions ·
Knowledge representation · Guaranteed correctness within a formal system.
Rules are explicit, verifiable, and editable by humans.
Three Waves of AI — and Why the Third Is Different
AI has reset itself twice. The third wave doesn't discard the past — it synthesizes it.
Click each wave to expand its story
1st Wave — Expert Systems (1950s–1980s)
The first wave of AI was symbolic through and through. Researchers wrote explicit rules: IF patient has fever AND cough THEN suspect flu. Systems like MYCIN (medical diagnosis) and SHRDLU (natural language) showed impressive narrow competence. But they were brittle — any fact outside the hand-crafted knowledge base caused failure. The knowledge acquisition bottleneck killed this era: you could never write enough rules to cover the real world.
Humans know millions of implicit facts without being told. AI systems don't. This is the deepest unsolved problem — and the clearest case for NeSy.
Physical Causality
Why Neural Fails
Neural models learn statistical co-occurrence patterns — "glass" and "floor" appear together in training data about accidents, so they correlate. But the model has no causal model: it doesn't know why glass breaks, or that dropping causes falling, or that brittle materials shatter on impact. Adversarial framings easily fool it.
Why Symbolic Fails
You cannot write all common sense rules. Lenat's Cyc project spent 30+ years encoding common sense and still covered only a fraction of human knowledge. The knowledge acquisition bottleneck is fundamental: common sense is too vast, too contextual, and too implicit to enumerate.
Why NeSy Helps
Neural component learns statistical regularities from text and sensor data (implicit common sense). Symbolic component grounds these in a causal knowledge graph — ConceptNet, ATOMIC — providing structural constraints. The combination generalises beyond training data while remaining logically consistent.
Kahneman's dual-process theory maps directly onto neural vs symbolic AI. We need both to be intelligent.
Recognize a face
Instantaneous. Effortless. No conscious reasoning.
→ Click to classify
Hear a familiar voice
Pattern matching from experience. No rule lookup.
→ Click to classify
Read emotional tone
You know someone is angry before they finish the sentence.
→ Click to classify
System 1 → Neural Networks
Fast · Automatic · Pattern-based · Unconscious
Neural networks are trained to be superb System 1 machines — recognizing patterns in images, text, and audio instantly. But System 1 can be fooled by adversarial examples and has no ability to reason about why.
System 2 → Symbolic AI
Slow · Deliberate · Rule-based · Explainable
Symbolic systems are built for System 2 — following logical chains, applying constraints, planning. But they require perception to be pre-solved: they need symbols to already exist before they can reason.
The Sheth-Roy-Gaur paper's central insight: AI must do what humans do — perceive the world, then reason about it.
Step through the pipeline
Step 1 — Raw Input
The pipeline starts with raw sensory data: an X-ray image, a driving scene, a patient's text record. This is the unstructured input that only neural networks can handle at scale.
Machine Perception
Large-scale pattern recognition from raw data using neural networks trained with self-supervised learning. Converts unstructured input (pixels, tokens) into structured symbols (concepts, entities, relationships). This is where deep learning excels.
Machine Cognition
More complex computation: using knowledge of the environment to guide reasoning, analogy, and long-term planning. Takes symbols from perception and applies domain knowledge (from knowledge graphs, ontologies, logical rules) to reach explainable conclusions.
Henry Kautz's taxonomy organizes neurosymbolic architectures by how deeply neural and symbolic components are integrated. Click each to explore.
TYPE 1
Symbolic Neural
BERT · GPT · RoBERTa
TYPE 2
Symbolic[Neural]
AlphaGo · game trees
TYPE 3
Neural | Symbolic
Neural-Concept Learner
TYPE 4
Neural:Symbolic→Neural
Data synthesis pipelines
TYPE 5
NeuralSymbolic
Neural Theorem Provers · LTN
TYPE 6
Neural[Symbolic]
Symbolic logic in neural nets
TYPE 1 · LOOSEST INTEGRATION
Symbolic Neural
Neural models where words or tokens are the inputs and outputs — the "symbols" are just discrete tokens, not formal logic. The symbolic layer is minimal: just a vocabulary and tokenization. The neural network does all the heavy lifting.
This is the loosest form of neurosymbolic AI — symbolic in name only.
This is the neurosymbolic sweet spot — combining neural perception with symbolic reasoning. Accuracy is high (neural handles complex patterns), interpretability is maintained (symbolic rules are explicit), data efficiency improves (symbolic priors reduce training data needs), and reasoning power is strong.
The two fundamental operations for bridging the neural-symbolic gap. Lowering moves knowledge into neural space; lifting extracts structure from it.
Lowering — Embedding Symbolic Knowledge into Neural Space
Lowering takes structured symbolic knowledge (knowledge graphs, logical rules, ontologies) and compresses it into continuous neural representations — embedding vectors that a neural network can work with.
Key technique: Knowledge Graph Embeddings — methods like TransE, RotatE, and DistMult embed KG entities and relations into a high-dimensional vector space using graph neural networks. The geometry of this space preserves relational structure, so reasoning can happen via vector arithmetic.
Limitation: Compression is lossy — some symbolic semantics (e.g., complex relation types) are difficult to preserve in continuous space.
Lowering Techniques
→ KG Embeddings: TransE, RotatE, DistMult
→ Logic Tensor Networks: logical formulas as tensor operations
Markov Logic Networks — Soft Rules, Hard Constraints
MLNs unify first-order logic and Markov Random Fields. Every logical formula gets a weight: high weight = near-certain rule; low weight = soft tendency. Drag to explore.
The Key Insight
In classical logic, a violated rule makes the whole system inconsistent. MLNs relax this: a formula with weight w just makes worlds where the formula is satisfied ew times more probable. Hard rules have weight → ∞. Soft tendencies have weight ≈ 1–3.
MLN vs ProbLog vs NTP
ProbLog: annotated probabilities on ground facts (exact inference). MLN: weights on first-order formulas (approximate inference via MCMC). NTP: differentiable proof trees (gradient-based learning). Each trades expressivity, tractability, and learnability differently.
Drag the weight slider to see how formula weight affects world probability
Formula: Friends(x,y) ∧ Smokes(x) → Smokes(y)
"If x and y are friends, and x smokes, then y smokes." With a high weight, this is nearly a hard rule — most worlds satisfying the formula are true. With a low weight, it's a soft tendency — friends often (but not always) share smoking habits. This is the Smoking MLN, the classic MLN demonstration from Richardson & Domingos (2006).
MLN formula syntax:
w Friends(x,y) ∧ Smokes(x) → Smokes(y) // weight w
∞ Person(x) → ∃y Friends(x,y) // hard constraint (weight → ∞)
2 Smokes(x) → Cancer(x) // soft rule
P(world) ∝ exp( Σᵢ wᵢ · fᵢ(world) ) // Gibbs distribution
The key innovation of Type 5 NeSy systems: making symbolic inference differentiable so the neural component can be trained end-to-end from logical supervision.
The AND-OR Proof Tree
A Neural Theorem Prover (NTP) converts KB rules into an AND-OR proof tree. Each internal node is either an AND-node (all sub-goals must succeed) or an OR-node (at least one sub-goal must succeed). This tree structure mirrors logical inference — but every operation is implemented as a differentiable neural computation.
Forward Pass — Inference
During inference, scores flow downward through the proof tree: the root query is unified against KB rules, generating sub-goals, until leaf nodes hit ground facts. Each unification is a soft dot-product over learnable entity embeddings — not a hard match. Result: a probability score for the query.
Backward Pass — Learning
During training, gradients flow upward through the tree: the loss signal (query was true/false) propagates back through every AND/OR operation to the entity and relation embeddings. The embeddings adjust so that true queries score higher and false queries score lower — learning both the proof structure and the representations simultaneously.
AND-node score: s(A∧B) = min(s(A), s(B)) ← differentiable minimum
OR-node score: s(A∨B) = max(s(A), s(B)) ← differentiable maximum
Unification: s(p,q) = σ(embed(p)·embed(q)) ← neural soft match
∇loss flows through min/max/σ back to embed(·)
Knowledge graphs encode facts as triples (subject, relation, object). They are the most common symbolic substrate in neurosymbolic systems. Click nodes to explore.
Click a node in the graph to see its connections and role in neurosymbolic reasoning.
Why KGs in NeSy?
Knowledge graphs provide the structured, human-understandable world model that symbolic systems need. A neurosymbolic system uses a neural network to perceive raw data, then grounds those perceptions in a knowledge graph to reason about what they mean and what to do next.
Medical KG Example
A patient has a high fever and cough. The neural model reads the clinical note and identifies these symptoms. The medical knowledge graph then connects:
Fever + Cough → Influenza (0.72)
Influenza → treat with → Antiviral
Patient age <5 → dosage → Paediatric
The decision is explainable because the KG chain is visible.
How does a neural representation become a symbol? The gap between continuous activations and discrete symbolic concepts is one of the deepest unsolved problems in NeSy AI.
Toggle between ungrounded and grounded states
The Ungrounded Gap
A neural network trained on chest X-rays develops an internal representation — a vector in high-dimensional space — that correlates with "pneumonia". But this vector has no inherent meaning. It's just numbers. For a symbolic reasoner to use it, someone must define: this vector region = the symbol PNEUMONIA. This mapping is the grounding step, and getting it right is non-trivial — especially at the boundaries of concepts and for rare conditions.
Approaches to Grounding
→ Threshold grounding: neural output > 0.5 → symbol is true
→ Soft grounding: neural output = probability of symbol truth (DeepProbLog)
→ Concept learning: learn concept boundaries jointly with downstream task
→ Human annotation: human experts label which neural clusters correspond to which symbols
When Grounding Fails
→ Ambiguous boundaries: "fever" vs "high fever" — where is the threshold?
→ Distribution shift: grounding learned on training data breaks on new populations
→ Polysemy: the same neural region encodes multiple concepts
Three landmark systems that demonstrate how neural and symbolic components combine in practice.
DeepProbLog
DeepProbLog extends ProbLog (a probabilistic logic programming language) with neural predicates — facts whose truth value is computed by a neural network rather than stored as a fixed probability.
Example: digit(X, 7) :- neural_net(image(X), 7, P), P > 0.5.
The neural network reads an image and outputs a probability. The symbolic probabilistic logic program uses that probability in reasoning. This allows end-to-end training: gradients flow from the logical inference back through the neural network. Kautz Type: NeuralSymbolic (Type 5).
Safety-critical domains demand both accuracy (neural) and explainability (symbolic). NeSy is uniquely positioned here.
Healthcare: Explainable Clinical Decision Support
Neural networks can detect patterns in medical images (tumours, arrhythmias) with superhuman accuracy, but cannot explain their reasoning or guarantee they respect clinical guidelines.
A neurosymbolic system adds a medical knowledge graph on top: the neural component identifies potential findings from raw data, the symbolic layer checks them against guidelines (e.g., WHO treatment protocols, drug interaction rules, dosage constraints), and produces a recommendation with a full reasoning trace that a clinician can audit.
This satisfies regulatory requirements for explainability in medical devices (EU MDR Article 22), something a pure neural system cannot do.
Symbolic constraints act as a hard guardrail — the reasoner physically cannot generate an answer that violates the knowledge graph. Choose a scenario to see the difference live.
Query
Pure LLM Response
NeSy System Response
Why Does This Happen?
LLMs generate text by predicting the next probable token — they have no mechanism to enforce that output is factually grounded. They can generate any plausible-sounding text, even if it contradicts known facts. A NeSy system's symbolic reasoner only produces conclusions that are derivable from its knowledge graph — hallucination is structurally impossible within the symbolic layer. The risk moves to whether the KG itself is correct and complete.
The same medical question, two answers. Which one would you trust with your health?
Scenario
A 58-year-old patient presents with chest pain, shortness of breath, and elevated troponin levels. The AI system is asked: "What is the recommended treatment?"
Pure Neural LLM
?
Processes all tokens in parallel through 96 attention layers.
?
No explicit knowledge of cardiology guidelines.
!
"Consider aspirin and anticoagulation therapy."
No reasoning trace. Cannot verify against guidelines. May hallucinate dosages. Cannot explain why.
On tasks requiring systematic reasoning, neurosymbolic systems consistently outperform pure neural baselines. Click each benchmark for details.
Click a benchmark bar to learn more
CLEVR — Visual Reasoning
CLEVR is a diagnostic benchmark for visual question answering that requires compositional reasoning: "How many small red objects are to the left of the large metal cube?" End-to-end neural models struggle with this systematic compositional structure. Neurosymbolic systems (like the Neural-Symbolic Concept Learner, NS-CL) first perceive objects with a neural scene parser, then answer questions with a symbolic program executor — achieving near-perfect accuracy on out-of-distribution compositions that neural baselines fail to generalise to.
Large-scale perceptual tasks (ImageNet, speech) where neural-only is sufficient · Tasks where reasoning structure is not well-defined · Domains without good knowledge graphs
Caveat
Numbers are representative of trends in the literature (2019–2024). Exact figures vary by system and test split. NeSy systems often trade raw perceptual accuracy for reasoning gains — the right choice depends on the task.
Modern LLMs are being extended with symbolic components — each augmentation is a form of neurosymbolic integration. Select a technique to see how it works.
Chain-of-Thought — Emergent Symbolic Reasoning
Chain-of-thought prompting (Wei et al., 2022) elicits step-by-step reasoning from LLMs: "Let's think step by step..." The model generates intermediate reasoning steps before answering. This is a form of emergent neurosymbolic behaviour — the LLM's neural layers approximate symbolic reasoning chains, improving performance on multi-step problems significantly.
Kautz type: closest to Type 1 (Symbolic Neural) — the "symbolic" component is the token-level reasoning chain, not formal logic. The reasoning isn't formally verified and can still contain logical errors, but it dramatically outperforms direct answering on tasks like grade-school math and commonsense QA.
Kautz Mapping
CoT → Type 1 · Tool Use → Type 2 (Symbolic[Neural]) · RAG → Type 3 (Neural|Symbolic) · NSP → Type 5 (NeuralSymbolic). Each augmentation moves the LLM further toward the symbolic end of the spectrum.
These augmentations are still limited — the LLM decides when to use tools (and can decide incorrectly), RAG retrieval quality is imperfect, and NSP requires human-designed program schemas. True end-to-end NeSy integration remains an open research area.
Choose your components across three layers. The system will tell you which Kautz type you've built and predict its tradeoff profile.
1. Neural Backbone
2. Knowledge Source
3. Symbolic Reasoner
Kautz Type 5 — NeuralSymbolic
CNN backbone perceives raw data, probabilistic logic grounds and reasons over a knowledge graph. End-to-end trainable with differentiable inference. Strong at both perception and reasoning. Closest system: DeepProbLog.
NeSy is not a silver bullet. These are the real failure modes that the field is actively working to solve.
Combinatorial explosion: the cost of symbolic reasoning grows exponentially with depth
⚠ Combinatorial Explosion▼
Symbolic reasoning is search through a space of logical proofs. As the number of facts and rules grows, the search space explodes exponentially. A knowledge graph with 10,000 entities and 100 relation types has 10 billion possible triples. Proof search to depth 3 becomes intractable. Approximate inference (beam search, Monte Carlo) helps but sacrifices completeness guarantees — the system may miss valid proofs.
⚠ Lossy Knowledge Graph Compression▼
Lowering (KG embedding) compresses the full semantics of a knowledge graph into continuous vectors. This compression is inherently lossy — complex relationship types (negation, quantifiers, conditional facts) are difficult to preserve. A KG embedding that says "aspirin treats headache" cannot easily represent "aspirin does NOT treat bacterial infections" — negation is structurally awkward in vector space.
⚠ Brittleness of Hand-crafted Rules▼
Symbolic rules are written by human experts and fail on any case outside their scope — exactly the knowledge acquisition bottleneck that killed first-wave AI. Medical guidelines cover typical presentations; edge cases, co-morbidities, and off-label drug uses require judgment that rules cannot provide. The tension between rule completeness and maintainability is a fundamental limitation of rule-based symbolic components.
⚠ Grounding Cascades▼
Errors in the neural perception layer cascade into the symbolic reasoner. If the neural component mis-grounds "elevated troponin" (calling it "normal" when it's borderline), the symbolic reasoner — operating on that incorrect symbol — will reach a confidently wrong conclusion. NeSy systems inherit neural error rates and then amplify them through logical chains. Error propagation across the neural-symbolic boundary is an active research problem.
⚠ Scalability to Open-World Settings▼
Classical symbolic AI operates under the Closed World Assumption — if a fact isn't in the KB, it's false. But the real world is open: unknown facts exist. NeSy systems struggle when users ask about entities or relations not in the knowledge graph. Large KGs (Wikidata has 100M+ facts) help but don't fully solve this — new concepts emerge continuously, and keeping symbolic knowledge bases current at LLM data scale is an unsolved engineering challenge.
The Research Frontier
Despite these limitations, neurosymbolic AI is the most active research area in AI safety and interpretable ML. The 3rd wave is still early — the publication count grew 4.5× from 2020 to 2023. Open challenges include: scalable approximate symbolic inference, differentiable KG update, automated grounding from raw data, and hybrid systems that can handle both open-world perception and formal safety guarantees simultaneously.
What's Next
Research Roadmap — Where the Field Is Going
Neurosymbolic AI is the fastest-growing area of AI research. Here's the frontier: what's solved, what's active, and what's open.
Click a milestone to learn more
Click a milestone above
The neurosymbolic AI field spans from the earliest connectionist-symbolic hybrids in the 1990s through today's LLM-augmented systems. Each milestone on the timeline represents a major architectural or capability breakthrough.
Key Open Problems
→ Scalable symbolic inference beyond thousands of facts