Semantic Predetermination in Psychological Research
Launch Report — 171 Studies · 1,486 Pairs · Four Verdict Categories
In memory of Jan Smedslund (1929–2026) · May 2026
Background
Jan Smedslund argued throughout his career that most empirical findings in psychology are pseudo-empirical — knowable a priori from the conceptual language researchers use to define their constructs. If construct A and construct B are described in overlapping theoretical language, their empirical correlation follows from semantics rather than from discovery. This web tool operationalises that argument, built as a memorial to Jan Smedslund (1929–2026). For each uploaded paper, the tool extracts construct definitions, embeds them using a large language model, computes pairwise cosine similarities, and tests whether those similarities predict the empirical effect sizes the authors report.
The Four Verdict Categories
A key development since the initial pipeline was the identification of a fourth verdict category: Semantic Inflation. The original three categories — Semantically Structured, Partially Structured, and Empirically Independent — classify papers by how well semantic proximity predicts effect sizes. Semantic Inflation is a distinct pathology: papers where both cosine similarities and effect sizes are exceptionally high, because the constructs being measured are so conceptually overlapping that restriction-of-range artefacts inflate the correlations beyond what the semantic gradient test can meaningfully evaluate.
| Verdict | Criterion | Interpretation | Papers |
| Semantically Structured | AB concordance ≥60% and/or ABC ≥50% | Effect sizes follow the semantic ordering of construct definitions | 85 (46%) |
| Partially Structured | AB concordance 40–60% | Mixed evidence — some relationships are semantically determined, some are not | 48 (26%) |
| Empirically Independent | AB concordance <40% and/or negative rho | Effect ordering does not follow semantic proximity — genuinely empirical findings | 43 (23%) |
| Semantic Inflation | Mean cosine >0.5 AND mean |β| >0.3 | Constructs too similar to be meaningfully distinguished; restriction-of-range inflates effects | 8 (4%) |
Semantic Inflation is not semantic predetermination — it is something worse. In a predetermined study, the effect can be predicted from the definitions but the constructs remain conceptually distinct. In a semantically inflated study, the constructs are so similar that the constructs themselves collapse into each other: measuring A and measuring B amount to measuring the same thing twice. The inflated papers cluster in the upper-right quadrant of Figure 1, separated from the main distribution by their extreme cosine values (>0.5) combined with high average effects (>0.3). They are excluded from the pooled Spearman analysis.
Results at Launch
Summary
| Metric | Value | Note |
| Studies in database | 171 | 13 flagged as semantic inflation; 158 in clean analysis |
| Construct pairs | 1,486 | 1,404 pairs after excluding inflated studies |
| Analysed papers | 184 | 45 ineligible (meta-analyses, validation, conceptual) |
| Pooled signed ρ (full DB) | 0.274 (p = 5.1e-27) | All 171 studies |
| Pooled signed ρ (clean) | 0.263 (p = 1.0e-23) | Excl. semantic inflation — primary result |
| Pooled unsigned ρ (clean) | 0.139 (p = 1.5e-07) | Magnitude prediction |
| Signed > Unsigned gap | +0.124 | Direction predicted better than magnitude |
| A>B concordance | 58.8% (8775 comparisons) | Above 50% chance baseline |
| A>B>C mediation gradient | 230/420 = 54.8% | 37 papers with mediation chains |
| Mean empirical R² | 0.316 (n=114) | Benchmark: 0.428 |
| Semantically Structured | 85 / 184 (46%) | |
| Partially Structured | 48 / 184 (26%) | |
| Empirically Independent | 43 / 184 (23%) | Of potential empirical interest |
| Semantic Inflation | 8 / 184 (4%) | Excluded from main analysis |
The scatter plot: four verdict zones
Figure 1 plots every study at the paper level — mean cosine similarity on the x-axis, mean absolute effect size on the y-axis — colour-coded by verdict. The main cluster of studies spans cosines from about 0.2 to 0.5, with mean effect sizes between 0.1 and 0.5. Within this cluster, the trend is upward: papers with higher average cosine similarity also tend to show larger average effects, consistent with the semantic predetermination prediction.
The eight studies in the upper-right quadrant (diamonds, purple) form a clearly separated group — high cosines above 0.5 combined with high effects above 0.3. These are the semantically inflated papers. Their separation from the main cluster is visually unambiguous in Figure 1 and matches the uploaded plot from the web tool. They are flagged in the database and excluded from the poole

d Spearman.
Figure 1. Paper-level scatter: mean cosine similarity vs mea
n |β|, colour-coded by verdict. Red = Semantically Structured; amber = Partially Structured; green = Empirically Independent; purple diamonds = Semantic Inflation. The shaded region marks the inflation cutoff (cosine >0.5 AND |β| >0.3). n = 171 studies / 1,486 pairs.
Clean-corpus Spearman and verdict distribution
Figure 2 (left) shows the pair-level Spearman scatter after excluding the 13 inflated studies. The signed Spearman on the clean corpus is ρ = 0.263 (p = 1.03e-23), across 1404 pairs from 158 studies. The unsigned correlation is ρ = 0.139 (p = 1.54e-07). Both are highly significant and the signed > unsigned gap of +0.124 confirms that direction is encoded in definitions more reliably than magnitude.
Figure 2 (right) shows the verdict distribution across 184 papers. Semantically Structured papers (46%) are those where the effect ordering closely follows the semantic ordering of construct definitions. Empirically Independent papers (23%) are the most theoretically valuable: they represent findings that were not already guaranteed by the language used to describe the construc
ts. Papers in this category deserve closer empirical scrutiny, as they are more likely to contain genuine discoveries.
Figure 2. Left: pair-level Spearman scatter on the clean corpus (excl. semantic inflation). Right: verdict distribution across 184 analysed papers. Red = Semantically Structured (46%); amber = Partially Structured (26%); green = Empirically Independent (23%); purple = Semantic Inflation (4%).
Semantic Inflation: a Distinct Pathology
Semantic inflation arises when a study’s constructs are defined so similarly that they cannot be meaningfully distinguished at the conceptual level. The correlation between their measures then reflects shared language rather than either genuine semantic predetermination or genuine empirical relationship. The restriction of range in cosine space means neither the A>B nor the A>B>C tests are meaningful: when all pairs have nearly identical cosines, the ordinal rank comparison produces noise.
The cutoff of mean cosine >0.5 combined with mean |β| >0.3 was identified empirically from inspection of the paper-level scatter plot shown in Figure 1. The inflated cluster is visually separated from the main distribution, not merely a continuous extension of it. This suggests the cutoff is capturing a qualitatively different phenomenon rather than an arbitrary threshold within a continuous distribution. The 13 studies identified as inflated include several familiar examples: Ryan D. Duffy (2017, mean cosine 0.706), Philippa Davie (2020, 0.610), and Ann Eklund (2020, 0.571). In each case the constructs — despite being given distinct theoretical names — draw on the same semantic field so deeply that the embedding model cannot distinguish them.
Cross-Cultural Generality
The signal has proved consistent across 15 or more countries and seven research domains. The strongest individual within-study Spearman values include papers from Norway (ρ = 0.934), Japan (ρ = 0.900), Spain (ρ = 0.894), Iceland (ρ = 0.888), Canada (ρ = 0.886), and the UK/Greece (ρ = 0.857). Papers from Ethiopia, Indonesia, Malaysia, India, and Jordan — submitted in batches from the Global South — showed a combined signed ρ of 0.434, the strongest batch-level result in the project’s development. Twenty-one papers achieve perfect A>B>C (100% of mediation chains pass), spanning Japan, Norway, Canada, China, Indonesia, South Korea, the Netherlands, and the United States.
This cross-cultural consistency is the most theoretically significant finding. A result confined to Anglo-American social psychology could be attributed to the writing conventions of that tradition. A result that appears equally in Ethiopian, Icelandic, Indonesian, and Japanese research cannot be a parochial artefact. It reflects something structural about how psychological constructs are defined in theoretical language — a structure shared globally through journal publishing conventions regardless of country of origin.
Using the Web Tool
The tool accepts any empirical paper in PDF format. It extracts construct definitions from theoretical sections using Claude, embeds them, and within a few minutes returns:
- A semantic verdict in one of the four categories described above.
- The construct definitions as extracted — allowing the author to verify or correct them.
- A cosine heatmap showing pairwise semantic proximity across all constructs.
- The A>B concordance rate for the paper.
- The A>B>C pass rate for each mediation chain, where applicable.
- Comparison to the benchmark database: how this paper sits relative to the corpus mean.
- An option to contribute the paper’s construct pairs to the growing benchmark database.
The tool is dedicated to Jan Smedslund, who spent his career arguing that psychology needed to distinguish conceptual necessity from empirical discovery. The benchmark database will continue to grow through contributed papers, eventually supporting domain-stratified verdicts and a mixed-effects model that accounts for variation between research traditions. The evidence presented here — 171 studies, 1,486 pairs, ρ = 0.274 (p = 10⁻²⁷) across the full database — suggests he was right.
May 2026 · Jan Ketil Arnulf (BI Norwegian Business School) · Developed with Claude (Anthropic) · In memory of Jan Smedslund (1929–2026)

