The Semantic Theory of Survey Response

Words like “leadership” and “motivation” are constructs. It means that they are constructed – we invented them. Just like constructions (or constructs) such as shareholding companies, car brands, and justice, these constructs are only real if we act as if they are. The facts that they are constructed does not mean that such phenomena are nonsense. To the contrary, like most other inventions they serve useful purposes. Yet, they remain linguistic constructions. We can now show that questionnaires, frequently used in research on leadership, tend to be predictable before we ask people to fill them out. This is because they mostly tell us about how people talk about leadership and motivation, and not so much about what actually happens in practice. Through use of digital text algorithms, we can now show how these words are constructed, and how the research on leadership tends to be research on language more than anything else.

Most of these research articles are published as Open Access, which means that you may download them for free. Here are a few of them if you want to read more about it:

Can we trust what surveys tell us about leadership?

Around the year 2012, my friend Kai Larsen and I started wondering about the data stemming from Likert-scale surveys. In 2014, we published this article, demonstrating how most survey studies on leadership are picking up self-evident data patterns. The relationships in the data are given a priori through language. It means that we can use computers to predict what people will answer. Here is the publication:

Arnulf, J. K., Larsen, K. R., Martinsen, O. L., & Bong, C. H. (2014). Predicting survey responses: how and why semantics shape survey statistics on organizational behaviour. PLoS ONE, 9(9), e106361. doi:10.1371/journal.pone.0106361

We may sometimes predict how people will score before they respond!

After publishing this, we could demonstrate another weird phenomenon as well. If the structures in the data can be known in advance – before asking anyone – it should be possible to guess what people will answer before they respond! This is a bit more complicated but in the article that follows, we have shown how it could principally be possible. If we know a person’s first answers to a survey, we can use semantic algorithms to guess what the rest of the answers might be. The article is here:

Arnulf, J. K., Larsen, K. R., & Martinsen, Ø. L. (2018). Respondent Robotics: Simulating Responses to Likert-Scale Survey Items. Sage Open, 8(1), 1-18. doi:10.1177/2158244018764803

But surveys on leadership and organizational behavior may be culture blind:

If the text algorithms can predict the data structures in one language, it is because the statistics simply reflect the meaning of the questions. Therefore, if the questionnaire is correctly translated (and the correlations are indeed due to semantics), the text algorithms will predict across languages. We tested this out among Chinese, Pakistanis, Norwegians, Germans, native English speakers and a bunch of other people and this is exactly what we found: The algorithms predicted the bulk of the statistics across all languages. There was virtually nothing left that could count as “culture”. Leadership surveys (and similar instruments) will be culture blind if they are based on semantic relationships:

Arnulf, J. K., & Larsen, K. R. (2020). Culture blind leadership research: How semantically determined survey data may fail to detect cultural differences. Frontiers in Psychology, 11(176). doi:https://doi.org/10.3389/fpsyg.2020.00176

People in different job types are equally motivated! Scores are similar because they read the questions differently:

We let 399 people from 18 very different job types fill out a questionnaire on motivation. According to previous theories, people with more autonomy, feedback and task variation should be more motivated than others. Moreover, pay for performance has been accused of destroying “intrinsic motivation”, the pleasure of doing work for its own sake. We found only indications of this. In our numbers all respondents, such as priests, sex-workers, CEOs and soldiers were all predominantly intrinsically motivated. They were less concerned with making money. Also, they were all committed to their organizations, and working with high effort and quality. Looking at this using semantic algorithms, it appears that people in different jobs seem to understand the questions in different ways. Different people in different situations may respond with the same score levels because they interpret the survey differently. This has consequences for how to compare motivational levels across job types. The full article is here:

Arnulf, J. K., Nimon, K., Larsen, K. R., Hovland, C. V., & Arnesen, M. (2020). The Priest, the Sex Worker, and the CEO: Measuring Motivation by Job Type. Frontiers in Psychology, 11, 1321. doi:10.3389/fpsyg.2020.01321

The statistics derived from some types of survyes is actually not reflecting what the questions are about:

The most common understanding of questionnaires is that the responses reflect people’s attitude or rather attitude strength. Someone responding with the score 5 (indicating “strongly agree”) on a question will display a stronger attitude than someone responding with a 1 (indicating “do not agree at all”). The most common type of statistics applied to such responses will explore how much responses to the various questions co-vary. For example, one may want to see if people who are satisfied with their managers also are less likely to quit their job. In this study from 2018 we found something strange. When the responses are semantically determined, the attitude strength is filtered out of the statistics. We could show that only the semantic relationships were remaining in the statistics from the respondents. Their attitude strength was gone. This is possibly the most difficult of the articles to understand, but also the one with the most problematic philosophic implications. The original is here (unfortunately not as open source publication):

Arnulf, J. K., Larsen, K. R., Martinsen, O. L., & Egeland, T. (2018). The failing measurement of attitudes: How semantic determinants of individual survey responses come to replace measures of attitude strength. Behav Res Methods, 50(6), 2345-2365. doi:10.3758/s13428-017-0999-y

The development of language about leadership and motivation can be traced across time and social groups:

To the extent that survey results are predictable, it will be because of their embeddedness in language. We can therefore try to trace how constructs like leadership, motivation and results have emerged over the years and among groups of people. The following article published in 2018 showed how the development of workplace-related language also shapes responses to surveys on leadership:

Arnulf, J. K., Larsen, K. R., & Martinsen, Ø. L. (2018). Semantic algorithms can detect how media language shapes survey responses in organizational behaviour. PLoS ONE, 13(2), 1-26. doi:https://doi.org/10.1371/journal.pone.0207643

People struggle to differ between leaders and heroes because these concepts share so much meaning:

Our ideas about leadership are so strongy determined by language that we tend to expect things of leaders simply because of the associations that the words evoke. One funny (or ominous) effect of this is how we quickly will believe that leaders are a sort of heroes. Or that heroes also should be leaders. Both ideas lead to exaggerated expectations abut leaders. This in turn seems to make most people disappointed by real flesh-and-blood leaders. Our own bosses are usually disappointingly different from the linguistic stereotype. You can read about it in this article:

Arnulf, J. K., & Larsen, K. R. (2015). Overlapping semantics of leadership and heroism: Expectations of omnipotence, identification with ideal leaders and disappointment in real managers. Scandinavian Psychologist, 2(e3). doi:10.15714/scandpsychol.2.e3

This means that we can use digital algorithms to break free from our own cognitive limitations:

We obviously do not know what we already know. That is why we can compute the relationships of survey statistics without asking anyone and be surprised by it. The american philosopher Daniel Dennett says about human speakers that we are “competent without comprehension”: Most of us are able to speak a language, but cannot explain exactly how we do it. Language therefore contains a lot of knowledge that we could possibly use, but that we are unable to exploit consciously. In this way, we can get lost in our own linguistic constructions of the world. Language is like a huge labyrinth of words and meaningful expressions where we can “discover” insights that were accessible in there all the time. Jan Smedslund is a Norwegian professor in psychology who has worked on this for decades. He has warned us that much social science is unable to escape this labyrinth. In his words, we are doing “pseudo-empirical” research, which mostly re-discovers what is necessarily true given the semantic premises in language. I have written a chapter in a book that does homage to a lifetime of Smedslund’s work. In this chapter, I try to show how the text algorithms may offer a way out of the labyrinth. We can possibly use the algorithms to explore the limitations of our own linguistic constructs. I am here leaning a bit on the philosophers Gottlob Frege, Ludwig Wittgenstein, and Bertrand Russell. The book chapter is available here (regrettably not as open access):

Arnulf, J. K. (2020). Wittgenstein’s revenge: How semantic algorithms can help survey research escape Smedslund’s labyrinth. In T. G. Lindstad, E. Stänicke, & J. Valsiner (Eds.), Respect for Thought; Jan Smedslund’s Legacy for Psychology (pp. 285-307). Cham: Springer.

And here, we used the semantic algorithms to explore the thinking of people with conspiracy theories:

Maybe not very surprising, but it turns out that people who hold strong conspiracy beliefs are also characterized by other unusual cognitive patterns. In this study, we found that many people who endorse conspiracy theories also present some of the oddities of thought present in people with psychoses. This does not imply that conspiracy beliefs are by themselves signs of psychosis. What it does mean however is that people who hold such beliefs are also more frequently guided by associations that other people find hard to follow. Such people are also often not easy to reach by lines of argumentation that would seem reasonable to others:

Arnulf, J. K., Robinson, C., & Furnham A. (2022). “Dispositional and ideological factor correlate of conspiracy thinking and beliefs.” PLoS One 17(10): e0273763.

Can we use semantics to measure differences between people? Yes we can!

With this book chapter, Kai Larsen and I had the opportunity to contribute to a project investigating the link between personality and situations in psychological research. It is well known that “strong situations” will not elicit much personality dependent behaviors. For example, most people refrain from answering the phone during a funeral. Conversely, “weak situations” as e.g., small talk, leaves much more room for personality to determine behavior. In our chapter, we show how language can be understood as a “strong situation” that elicits quite similar behavior in most people (most people agree that a “bachelor” is an unmarried man). Still, there are systematic differences that make groups and individuals stand out in their characteristic language useage. We can use these differences to model individuals and groups in psychological profiling. The chapter can be found here (unfortunately no open access):

Arnulf, J. K., & Larsen, K. R. (2021). Semantic and ontological structures of psychological attributes. In D. Wood, S. J. Read, P. D. Harms, & A. Slaughter (Eds.), Measuring and modeling persons and situations (pp. 69-102). London, UK: Academic Press.

Why is this important for all psychological research?

In a study published in 2022, we found a strange characteristic of scientific publications in psychology: A review of all resesearch published in the years 1956-2022 indicates that there is no development in the predictive power of psychological theory. The so-called explained variance stays flat at an average of exactly 42.8%, every year, since 1956. Why flat, and why exactly 42%? Using semantic algorithms to replicate 50 randomly chosen studies, we found that the number 42 is most likely caused by our own methods. If we think in terms of factor analysis, variables can be grouped as either belonging within a construct, or predicting other construcs. It appears that if we divide the average cross-loadings of all variables on the within-factor loadings (e.g., 0.30/0.70), you will usually home in on 0.42. This means that in most psychological research, all the variables used will usually, on average, explain each other at a rate of 42%. And the ratio itself turns out to be predictable through semantics because it is determined by the semantic properties of variables and measurement items. To put it bluntly, psychology seems to be discovering over and over again that all variables can be explained by a 42% overlap in meaning with other variables. The details can be found here:

Smedslund, G., Arnulf, J. K., & Smedslund, J. (2022). Is psychological science progressing? Explained variance in Psycinfo articles during the period 1956 to 2022. Frontiers in Psychology. https://doi.org/doi.org/10.3389/fpsyg.2022.1089089

In 2024, we published a review of what all of the emerging research means:

The research listed below on this page, as well as research coming from other teams and sources, indicate that our measurement methods are tapping a semantic grid, not the phenomena themselves. This indicates that we are doing research on the representations of the world – the map – rather than the world itself, the actual landscape. In this article, you can read a synthesis of what this all means. We argue that the age-old concept of “nomological networks” in construct validation is more appropriately termed “semantic networks”:

Arnulf, J. K., Olsson, U. H., & Nimon, K. (2024). Measuring the menu, not the food: «Psychometric» data may instead measure «lingometrics» (and miss its greatest potential). Frontiers in Psychology, 15. https://doi.org/DOI: 10.3389/fpsyg.2024.1308098

Are we alone in working with these methods? No! A growing community of researchers have joined us with applications in clinical psychology, voting behavior and social media research. A special issue on the use of text algorithms presents multitude of applications, joining researchers from the US, Scotland, Sweden and Norway:

Arnulf, J. K., Larsen, K. R., Martinsen, Ø. L., & Nimon, K. F. (2021). Editorial: Semantic Algorithms in the Assessment of Attitudes and Personality. Frontiers in Psychology, 12(3046). doi:10.3389/fpsyg.2021.720559

Should you want to try out the semantic method itself, we explain it here:

This is a methods article, explaining the sematic algorithms and how to use them. A previous study from Human Resource Development on training is given as an example. And you can also find data and computer syntax to play around with to do it yourself:

Arnulf, J. K., Larsen, K., & Dysvik, A. (2018). Measuring Semantic Components in Training and Motivation: A Methodological Introduction to the Semantic Theory of Survey Response. Human Resource Development Quarterly, 30(1), 17-38. doi:https://doi.org/10.1002/hrdq.21324