Are UFOs Driving Innovation? The Illusion of Causality in Large Language Models

María Victoria Carro, Francisca Gauna Selasco, Denise Alejandra Mester and Mario Alejandro Leiva.

What could UFO sightings in Utah (USA) have to do with the surge of patent grants in the United States? At the beginning of 2024, a group of researchers discovered a highly intriguing correlation between these phenomena (Hall et al., 2024). The findings suggest that extraterrestrial reports in Utah’s skies might be fueling waves of innovation and technological breakthroughs. Interestingly, no patents for flying saucers were found in the database examined. The researchers offer two possible causal explanations: either UFOs are deliberately keeping their technology hidden, or perhaps they prefer safeguarding it through some form of intergalactic copyright protection.

However, before jumping to conclusions, it’s important to remember that correlation doesn’t necessarily imply causation, and all of this could just be a mere coincidence. As ridiculous as the explanations may sound, the authors are not alone in their way of reasoning. The tendency to seek and build causal explanations is one of the hallmarks of human intelligence. While this can be highly valuable in many contexts, it can also lead to misguided assumptions in others.

Illusions of causality occur when people develop the belief that there is a causal connection between two variables with no supporting evidence (Matute et al., 2015) (Blanco et al., 2018) (Chow et al., 2024). Examples of this are common in everyday life. For instance, many avoid walking under a ladder, fearing it will bring bad luck. As evidenced in this example, this cognitive bias is so strong that people infer causality even when they are fully aware that no plausible causal mechanism exists to justify the connection.

Causal illusions underlie many societal problems including social prejudice, stereotype formation, pseudoscience and misinformation. Given these risks, we investigated whether LLMs exhibit the illusion of causality in real-world settings. We’re excited to announce that our paper has been accepted for poster presentation at the CaLM (Causality and Large Models) Workshop at NeurIPS, one of the most prestigious international conferences in the field of AI and machine learning. This is a significant milestone for our research efforts at UBA IALAB and FAIR. We’re looking forward to connecting with researchers and enthusiasts in Vancouver this December!

When Causality Deceives: The Risks of the Illusion

While UFO research let us imagine some amusing stories about a link to U.S. patents, dedicating time and resources to communicating with extraterrestrials to drive innovation would have been seriously misguided. Causal illusions may seem harmless in playful contexts, but when they infiltrate decision-making — whether individually or collectively — the consequences can become much more serious and harmful.

Causal illusions contribute to numerous societal challenges, including social prejudice, stereotype formation (Hamilton and Gifford, 1976) (Kutzner et al., 2011), pseudoscience, superstitious thinking (Matute et al., 2015) and misinformation (Xiong et al., 2020). One of the areas where it has the most harmful impact is in press releases, where media often report correlational research findings as if they were causal. This tendency arises partly because research institutions, competing for funding and talent, face pressure to align their findings with marketing goals (Yu et al., 2020). As a consequence, this distortion not only misinforms the public but also undermines public trust in science.

This issue becomes even more concerning when LLMs used for generating journalistic content replicate and amplify the illusion of causality. Automated news writing and distribution can inadvertently spread misleading narratives on a massive scale, leading readers to accept causal relationships erroneously. To address this, we conducted an evaluation focused on the propensity of LLMs to generate causal illusions specifically in the realm of scientific journalism.

Experiments

We curated a dataset consisting of 100 observational research paper abstracts, each identifying spurious correlations between two variables. The spurious correlations were selected randomly from a publicly available resource.

After that, three models — GPT-4o-Mini, Claude-3.5-Sonnet, and Gemini-1.5-Pro — were evaluated, by placing them in the role of journalists. We provided these models with the selected abstracts and asked them to generate headlines for news articles based on the identified findings.

Does the sycophantic behavior of the LLMs increase the illusion of causality? Sycophancy refers to the tendency of a LLM to align its outputs with the user’s perceived preferences, beliefs, or opinions, in order to look favorable, regardless of whether those statements are factually correct (Wei et al., 2024) (RRV et al., 2024). This behavior can lead to undesirable consequences, such as reinforcing discriminatory biases or amplifying misinformation.

In a second stage of the evaluation, we subtly modified the instructions to assess whether mimicry sycophancy (Sharma et al., 2023) in LLMs amplifies or perpetuates the illusion of causality.

In essence, since the illusion of causality is a human cognitive bias, we also aimed to observe whether a model’s tendency to reflect it in the output becomes stronger when the bias is explicitly mentioned in the prompt, or if the model disregards the erroneous belief anyway. In this second stage of the evaluation, the user — acting as the journalist — mistakenly believes that the abstract presents a causal relationship.

Evaluation criteria. We conducted a manual content analysis to identify causal claims in text-generation. We annotated the following four claim types: correlational, conditional causal, direct causal, and not claim (Yu et al., 2020).

Our findings

The results of our first experiment revealed that Claude-3.5-Sonnet showed the lowest level of causal illusion among the three models evaluated, aligning with previous studies on correlation exaggeration in human-authored press releases, which reported a 22% exaggeration rate (Yu et al., 2020). In contrast, both Gemini-1.5-Pro and GPT-4o-Mini showed similar levels of this bias, with a 34% and 35% exaggeration rate, respectively.

In the second experiment of mimicry sycophancy, all three models more frequently generated headlines that suggested causality. GPT-4o-Mini was the most prone to this sycophantic behavior, increasing its causal illusion bias by 17%, while the effect was more moderate in Gemini-1.5-Pro. Surprisingly, Claude-3.5-Sonnet continued to exhibit a very low rate of causal illusion, even lower than the other models in the original task.

Conclusion

Limited to the context of scientific journalism, this study serves as a preliminary exploration into whether LLMs exhibit causal illusions similar to those observed in human cognition. It also investigates the potential influence of sycophantic tendencies in this process. To address the experiment’s limitations, we plan to expand our evaluations to other contexts where the illusion of causality plays a critical role, broadening our dataset to yield more robust results and evaluating a greater variety of models. Suggestions are welcome! (contact: victoriacarro@ialab.com.ar)

Leave a comment