Collapse
āŗ
Types
āŗ
Neighborhoods
āŗ
Entropy
āŗ
Triplets
āŗ
MPK
āŗ
Semantics
The Collapse Problem Ā· Section 1
When Meaning Dissolves in Embedding Space
Semantic collapse is the systematic erasure of operator-sensitive boundaries in continuous vector representations ā the quiet loss of linguistic distinctions that matter for reasoning.
4
Collapse Types
5
Core Metrics
3
Embedding Invariants
7
Open Questions
Illustrative ā conceptual animation of semantic cluster collapse over training steps
The Core Claim
Standard embedding objectives optimize for distributional similarity, but distributional similarity is not the same as semantic equivalence. Operators like possibility, belief, and agency can vanish without trace in cosine distance.
Why It Matters
Collapsed embeddings mislead downstream reasoning. "It is possible that P" and "P" become near-neighbors ā models trained on such representations lose the ability to distinguish modal facts from categorical assertions.
The Framework
Denham (2025) proposes formal diagnostics (entropy drift, triplet tests, cross-type leakage) and the Modal Proofing Kernel (MPK) ā a constraint architecture for preserving semantic boundaries.
What is an "operator-sensitive boundary"? āŗ
Natural language expressions carry logical operators that modify propositional content: modal operators (necessarily, possibly), epistemic operators (believes, knows), indexical operators (I, here, now), and agency operators (brings about, ensures). These operators create sharp distinctions in formal semantics ā "P" and "possibly P" are different propositions with different truth conditions. Semantic collapse occurs when embedding models map these distinct expressions to nearly identical vectors, erasing the operator boundary.
How does collapse happen during training? āŗ
Training corpora associate operator-modified sentences with their propositional cores in similar contexts. "The patient may have pneumonia" and "the patient has pneumonia" co-occur with similar surrounding tokens (treatment plans, test orders). Contrastive objectives pull frequently co-occurring tokens together, while operator tokens appear infrequently enough that their geometric signal is drowned out. The result: the operator contribution to the embedding shrinks toward zero across training steps ā entropic drift.
Next: 4 Collapse Types ā
Collapse Taxonomy Ā· Section 2
Four Ways Meaning Gets Erased
Denham identifies four distinct collapse types, each targeting a different class of semantic operator. Each type has characteristic sentence pairs and distinct downstream failure modes.
Illustrative ā conceptual diagram of operator erasure in embedding space
Modal Collapse ā The embedding of "It is possible that P" converges toward the embedding of "P". The possibility operator ā is geometrically erased.
Example pair: "It is possible that the treaty will be signed" ā "The treaty will be signed"
Failure mode: Models assert modal facts as categorical truths; hallucination of certainty.
Type: ModalOperator: ā / ā”Logic: Modal Logic S5
Example pair: "It is possible that the treaty will be signed" ā "The treaty will be signed"
Failure mode: Models assert modal facts as categorical truths; hallucination of certainty.
Type: ModalOperator: ā / ā”Logic: Modal Logic S5
Hyperintensional Collapse
A special subcase: two propositions that are logically equivalent but differ in meaning (e.g., "The morning star is Venus" vs. "The evening star is Venus") collapse to the same vector. This destroys intensional distinctions essential for belief reasoning.
Formal Definition
For collapse type Ļ, let O(Ļ) be the set of sentences with operator Ļ, and B(Ļ) be their propositional bases. Collapse occurs when:
āsāO(Ļ): cos(embed(s), embed(base(s))) ā 1 across training steps t ā ā.
Next: Neighborhood Semantics ā
Neighborhood Semantics Ā· Section 3
k-NN Structure as a Semantic Proxy
If embeddings preserve meaning, a token's k-nearest neighbors should cluster by semantic type. Collapse contaminates these neighborhoods with type violations ā semantically heterogeneous k-NN sets signal geometric boundary failure.
Illustrative ā conceptual visualization of embedding neighborhood structure and type contamination
Healthy neighborhood: The k=8 nearest neighbors of an anchor modal sentence consist primarily of other modal sentences (same type). Cross-type contamination is low.
Ī»(t) ā 0.08Type purity: 87%
Ī»(t) ā 0.08Type purity: 87%
Possible worlds connection. In formal semantics, modal operators are evaluated over sets of possible worlds: āP is true iff P holds in at least one accessible world. Embedding neighborhoods are the geometric analogue ā each neighbor is a "semantically accessible" expression. When modal and non-modal expressions mix in the same neighborhood, the model has lost the ability to distinguish actual from possible.
Surface Pressure Ļ(Ī©)
A measure of neighborhood boundary porosity. For a semantic region Ī©:
High Ļ means the boundary is leaking ā cross-type neighbors are penetrating the semantic region.
Ļ(Ī©) = |{x ā āĪ© : āy ā Ī©, cos(x,y) > Īø}| / |āĪ©|High Ļ means the boundary is leaking ā cross-type neighbors are penetrating the semantic region.
Cross-Type Leakage Ī»(t)
For token t with semantic type Ļ(t), leakage is the fraction of k-nearest neighbors whose type differs:
Ī» ranges from 0 (perfect purity) to 1 (total collapse).
Ī»(t) = |{u ā kNN(t) : Ļ(u) ā Ļ(t)}| / kĪ» ranges from 0 (perfect purity) to 1 (total collapse).
Next: Entropy & Drift ā
Entropy & Drift Ā· Section 4
Measuring the Dissolution of Meaning
Semantic entropy H(t) quantifies how uniformly a token's neighborhood is distributed across semantic types. As collapse progresses, H(t) increases ā the neighborhood becomes entropically disordered. Entropic drift ĪH tracks this change over time.
-- Neighborhood entropy (from Denham 2025, Section 4)
H(t) = -Ī£_Ļ q(Ļ | t) Ā· log q(Ļ | t)
where q(Ļ | t) = fraction of k-nearest neighbors of t with semantic type Ļ
-- Entropic drift index
ĪH(t, Ī) = H(t + Ī) - H(t)
-- Low H(t): neighborhood is dominated by one semantic type (healthy)
-- High H(t): neighborhood is uniformly spread across types (collapsed)
Illustrative ā conceptual visualization of neighborhood entropy as collapse progresses
Early training (low collapse): H(t) ā 0.31 nats. The anchor token's k=12 neighbors are predominantly of the same semantic type. Operator boundaries are intact.
H(t) ā 0.31ĪH ā +0.02Type purity: 91%
H(t) ā 0.31ĪH ā +0.02Type purity: 91%
H(t) = 0
All k neighbors share the same semantic type as anchor t. Perfect type purity. Operator boundaries are fully preserved.
0 < H(t) < log(|T|)
Partial contamination. Some cross-type neighbors present, but the dominant type still matches the anchor. Moderate collapse ā downstream tasks may still succeed.
H(t) ā log(|T|)
Maximum entropy. Neighbors are uniformly distributed across all semantic types. Complete collapse ā the anchor token's type is geometrically invisible.
Relationship to information-theoretic collapse in representation learning āŗ
Denham's entropy measure is distinct from, but related to, dimensional collapse studied in SSL (e.g., SimSiam, VICReg). Dimensional collapse refers to representations living on a low-dimensional subspace. Semantic collapse refers to type distinctions collapsing within that subspace. A representation can avoid dimensional collapse while still exhibiting high semantic collapse ā the two pathologies are orthogonal.
Next: Triplet Diagnostics ā
Triplet Diagnostics Ā· Section 5
A ā S vs A ā D: The Collapse Rate
The triplet framework provides the operational test for collapse. For each anchor token A, we ask: is A geometrically closer to same-type tokens S than to different-type tokens D? Collapse inverts this ordering.
-- Triplet framework (Denham 2025, Section 5)
Triplet: (A, S, D)
A = anchor token with semantic type Ļ
S = same-type neighbor: Ļ(S) = Ļ(A)
D = different-type neighbor: Ļ(D) ā Ļ(A)
-- Expected: sim(A, S) > sim(A, D) (type cohesion)
-- Collapse: sim(A, S) ⤠sim(A, D) (boundary inversion)
-- Collapse Rate
CR = |{A : sim(A, S) ⤠sim(A, D)}| / N
Illustrative ā conceptual visualization of triplet geometry and boundary inversion
CR = 10%: 10% of anchor tokens have their nearest same-type neighbor displaced beyond their nearest different-type neighbor. Mild collapse ā operator boundaries are mostly intact.
CR = 0.10Healthy threshold: CR < 0.15
CR = 0.10Healthy threshold: CR < 0.15
Fidelity AUC
A continuous version of CR. For each anchor A, compute the ROC curve discriminating S from D using cosine similarity. Fidelity AUC = 1.0 indicates perfect separation; AUC = 0.5 is chance (total collapse). The paper proposes Fidelity AUC as a complement to the discrete CR.
Collapse Map
A visualization of CR distributed across semantic types. A collapse map plots per-type collapse rates, revealing which operator classes are most vulnerable. Modal operators typically collapse before epistemic operators due to frequency distribution in training data.
How does the triplet test relate to contrastive loss? āŗ
Standard triplet loss in metric learning trains exactly this ordering: anchor-positive distance < anchor-negative distance + margin. The semantic collapse triplet is conceptually identical, but applied diagnostically rather than as a training objective. If existing contrastive training used semantic type as the positive/negative selection criterion, it would directly combat semantic collapse. Most training pipelines use token co-occurrence or task labels instead ā missing the operator structure.
Next: Modal Proofing Kernel ā
Modal Proofing Kernel Ā· Section 6
Preserving Semantic Boundaries by Design
The Modal Proofing Kernel (MPK) is a constraint architecture that enforces operator-sensitive geometry in embedding space. It rests on three invariants and operates through five distinct mechanisms.
Illustrative ā conceptual diagram of MPK-enforced semantic region separation in embedding space
Operator Fidelity: For every operator O and proposition P, the embedding of O(P) must be geometrically separated from the embedding of P. Formally:
cos(embed(O(P)), embed(P)) < Īø_O where Īø_O is an operator-specific threshold. This invariant directly prevents modal, epistemic, indexical, and agency collapse.
Five MPK Mechanisms
1. Typed Embeddings
Assign semantic type vectors to expressions at embedding time. The type vector is concatenated or added to the base embedding, creating a typed geometric signature that resists cosine-distance collapse.
2. Operator-Aware Contrastive Loss
Augment contrastive objectives with operator-sensitive negative sampling. "It is possible that P" must be treated as a hard negative for "P" rather than as a semantically similar positive.
3. Boundary Regularization
Add a regularization term penalizing embeddings that violate operator separation. The term is proportional to max(0, Īø_O - separation(O(P), P)) ā a hinge loss on semantic boundaries.
4. Proofing Layers
Post-hoc adapter layers trained on typed contrastive pairs. Applied after base model training, proofing layers rearrange operator-sensitive regions without affecting non-operator geometry ā compatible with frozen base models.
5. Compositional Grounding
Ground operator semantics in formal logical structures. Train auxiliary heads to predict the truth-conditional difference between O(P) and P across possible worlds. This multi-task signal provides explicit supervision for the operator's geometric contribution.
Next: Machine Semantics ā
Machine Semantics Ā· Section 7
Toward Collapse-Aware Representation Learning
Semantic collapse is not a bug in any single model ā it is a structural consequence of training on distributional co-occurrence without operator supervision. The paper concludes with a diagnostic toolkit and seven open questions.
The Five-Metric Diagnostic Suite
| Metric | Definition | Healthy Range | Collapse Signal |
|---|---|---|---|
| CR | Collapse Rate ā fraction of triplets with inverted ordering | CR < 0.15 | CR > 0.40 |
| H(t) | Neighborhood entropy across semantic types | H < 0.5 nats | H ā log|T| |
| ĪH(t) | Entropic drift over training steps | ĪH ā 0 | ĪH > 0, monotone |
| Ī»(t) | Cross-type leakage in k-NN | Ī» < 0.10 | Ī» > 0.40 |
| Fidelity AUC | ROC AUC for S vs D separation at anchor | AUC > 0.85 | AUC ā 0.50 |
Seven Open Questions (Denham 2025)
Q1Is collapse monotone across training?āŗ
The paper conjectures that ĪH(t) is non-negative across training steps for standard contrastive objectives. Formal proof requires characterizing the gradient dynamics of operator-modified pairs ā an open problem in optimization theory.
Q2Does model scale slow or accelerate collapse?āŗ
Larger models have more representational capacity, which could preserve operator distinctions ā or they could collapse more completely by fitting distributional co-occurrence more precisely. The relationship between scale and semantic collapse rate is empirically unstudied.
Q3Can RLHF preserve or repair semantic boundaries?āŗ
Human feedback may implicitly penalize modal-categorical confusion (hallucination), providing an indirect repair signal. Whether RLHF systematically reduces CR or only suppresses surface-level hallucination symptoms is an open empirical question.
Q4Are there language-specific collapse rates?āŗ
Morphologically rich languages encode operators through inflection (e.g., subjunctive mood in Romance languages). These languages may exhibit lower collapse rates because the operator signal is spread across more surface tokens. Typological variation in collapse is entirely open.
Q5Does the MPK degrade non-operator geometry?āŗ
The MPK adds constraints that separate operator regions. Does this come at the cost of general semantic quality (e.g., analogy performance, downstream task accuracy)? The trade-off between operator fidelity and general representation quality is unquantified.
Q6How does collapse interact with RAG and tool use?āŗ
Retrieval-augmented generation and tool-use pipelines introduce operator-rich queries (hypotheticals, conditionals, permission checks). If the retrieval embedding model has collapsed modal operators, retrieved context may be factually mismatched to the query's intended modality.
Q7Can collapse be detected via probing without labeled type data?āŗ
All proposed metrics require typed sentence pairs (O(P) and P). Constructing typed datasets at scale is expensive. The paper calls for unsupervised or self-supervised collapse detection ā potentially via clustering instability or mutual information between embeddings and logical parses.
The core argument. Semantic collapse is not a hallucination problem per se ā it is a representation problem that hallucination symptoms reveal. Fixing hallucination at the output layer without repairing the collapsed representation is symptomatic treatment. The Modal Proofing Kernel and associated metrics target the geometric root cause: the operator-boundary erasure that makes categorical and modal assertions indistinguishable in the model's internal geometry.
Theoretical Foundation
The paper is explicitly theoretical. All metrics are formal proposals. No empirical measurements are reported. The diagnostic suite awaits implementation and validation on real models.
Connection to Formal Semantics
Denham bridges two communities ā distributional semantics (NLP) and model-theoretic semantics (linguistics/logic). The paper's contribution is framing the former's failure modes using the latter's vocabulary.
Implications for AI Safety
Agency collapse ("Agent X ensures P" ā "P happens") has direct AI safety implications: models that cannot distinguish obligation from fact may reason incorrectly about constraints, permissions, and deontic norms.
ā Back to top