Consider the following headlines:
“Morocco’s King Appoints Committee Chief to Fight Poverty and Inequality.”
“Government Officials Have Manipulated Stock Prices to Hide Scandals.”
“Hyatt Will Remove Small Bottles from Hotel Bathrooms.”
Can you tell which one is fake?
All three headlines are among 20 compiled by researchers into a new Misinformation Susceptibility Test — the MIST — designed to measure how well you can discern real and fake news, and distinguish them from each other, in two minutes. You can take the test for yourself here.
The MIST is the product of four years of work by a team of researchers co-led by Rakoen Maertens, a postdoctoral research associate at Cambridge University who studies misinformation and polarization, and Friedrich Götz, a personality researcher and assistant professor of psychology at the University of British Columbia.
Susceptibility to misinformation depends on much more than just factual knowledge or education, Maertens told me. “Lots of people share fake news, not necessarily because they actually believe [it], but rather because they belong to a social group, or they have a group identity, or attitudes towards a certain issue, and they want to express that attitude,” or because there is an established social norm to follow a certain leader, he said. Understanding what, exactly, underpins susceptibility to misinformation is critical to effectively fight it. “If you understand how the social psychology of misinformation susceptibility works, then you can also develop better interventions for it,” he added, from better social media products and algorithms to regulatory policies.The MIST measures four different variables: ability to distinguish real news from fake news (“veracity discernment”); ability to correctly identify real news; ability to correctly identify fake news; and general tendency toward skepticism or gullibility (“distrust/naivete” on a scale from overly skeptical to overly gullible). To understand the distrust/naivete spectrum, think back to the first three headlines. If you thought all three might be fake, for instance — though two are real — you’d probably land on the more skeptical end of the spectrum. Conversely, if you were to mark several fake headlines real, the MIST would rank you on the gullible side of the scale. This metric is especially intended to detect those who “just respond yes or no to almost everything,” Maertens said.
Among psychologists who study misinformation, some believe individual variables to be more important than others, Maertens explained. But rather than pass judgment on which mattered most, the MIST team wanted to take all four into account.
That said, among those who study misinformation, “we are converging toward discernment as the gold outcome measure of someone’s ability to detect fake news,” Maertens added. “It’s not just about detecting fake news or just trusting a trustworthy source; it’s about really being able to distinguish between them.”
|| New publication
The Misinformation Susceptibility Test (MIST)#AI-based (#GPT) fake news detection test
Paper (Behavior Research Methods)https://t.co/1WCko6T9kU
Press Release (@Cambridge_Uni)https://t.co/A69WeRTV8x
Online Quiz (#YourMIST)https://t.co/P89rQdbhAA
(1/15) pic.twitter.com/Sb3sap3yHn
— Rakoen Maertens (@RakoenMaertens) June 29, 2023
The MIST headlines are not intended to measure factual knowledge. Rather, “we’re trying to measure someone’s skill to detect the deceiving element that is most often common in misinformation,” like manipulative language, Maertens noted.
Beyond making the MIST quick to complete, the research team wanted to craft a test that was “as simple as possible and as stable as possible,” Maertens said. The researchers chose to exclusively use headlines, instead of, say, also incorporating sources and images, in part because they realized perceptions of these would change more over time, meaning there could be “memory confounds” for those taking the test. They also chose to only include headlines that could be categorized as “real” or “fake,” instead of asking respondents to rate their reliability on a more nuanced scale. Maertens doesn’t believe misinformation is that black and white across the board, but the team opted to create these binary categories on the test for user simplicity. They also found that in other research explicitly exploring the effects of different question framings and response modes (like scales) on responses, these more nuanced approaches make “almost no difference” to research results, Maertens said.
In June, YouGov released results of a survey of 1,516 U.S. adult citizens testing their ability to identify real and fake news using the MIST.
On average, Americans correctly classify a headline about two-thirds of the time, according to this data. Intriguingly, the survey suggests that young Americans (under 45, including 18- to 29-year-olds and 30- to 44-year-olds) do a worse job distinguishing between real and fake news than older Americans (over 45, including 45- to 64-year-olds and adults 65 and older). Younger adults correctly identified 12 out of 20 headlines, compared to 15 out of 20 among older adults. The results suggest “younger Americans may be more likely than older adults to suspect misinformation where there is none and may trust news headlines less overall,” per the study. The survey responses show a slightly stronger tendency toward the former (suspecting misinformation where there is none), as younger Americans correctly identified just 6 in 10 real headlines on average (while older Americans scored 8 out of 10 on this front). YouGov also found that “Young adults self-report spending more time online than people who are 45 and older do, something that is closely tied to people’s susceptibility to news misinformation.”
Survey results suggest that younger Americans are more likely to consume news via social media, and that this correlates with greater misinformation susceptibility. Among audiences of mainstream news audiences, AP news consumers performed best at distinguishing between real and fake headlines, with an average of 16 correct headlines. On the other hand, those who get news from social media performed poorly overall and those who get their news from Snapchat performed particularly poorly. A little more than half (53%) of Snapchat news consumers correctly identified just 10 or fewer headlines (though it’s worth noting that other research indicates relatively few people get news from Snapchat).
“We didn’t know what to expect in terms of age and performance on MIST,” Carl Bialik, YouGov’s U.S. politics editor, told me in an email. Bialik contributed to and edited the survey and article sharing its findings. He noted that many older adults have more experience consuming news, while younger adults may not spend as much time consuming news from traditional sources. Personally, Bialik found the negative correlation of recreational time online with high MIST scores “somewhat surprising,” though he added “it could be that much of the recreational time is spent either with misinformation or doing things outside of news entirely.”
Bialik was also surprised to see that in comparisons of MIST results by political party, “the partisan splits on percentages who identify each headline as real or fake were modest for most headlines” (Democrats and independents performed better on average than Republicans, but agreed, for the most part, on whether headlines were real or fake).Maertens did not conduct the study, but contributed to its analysis and data interpretation. He acknowledged that the age distinction suggesting worse misinformation resilience among younger Americans is a striking finding, because “many people believe, and lots of studies on misinformation find, that older people are more susceptible, so it is surprising that here we find the opposite.” But he said more research needs to be done about why there is conflicting evidence about this question.
Beyond Bialik’s observation about older Americans having more familiarity with mainstream news, Maertens suggested that since the MIST only includes headlines, it might yield different results if it included visuals and sources. It’s possible that older respondents could fare worse evaluating attributes like photoshopped images.
Beyond the YouGov survey, other forthcoming studies using the MIST examine the relationship between misinformation susceptibility and Twitter behavior, as well as vaccine uptake in the United Kingdom.
One of Maertens’ big-picture findings from the past four years of research: A variable called “actively open-minded thinking” is the best predictor of misinformation resilience. The variable combines three different attributes: intellectual humility — can people consider whether they might be wrong?; openness to opposing viewpoints and different perspectives — such as willingness to listen to people “who are from another social group that you don’t appreciate”; and a willingness and ability to update your own beliefs based on new information.
More research is needed to determine which of these components is most important to evaluating and resisting misinformation, he said. But to Maertens, the promise of this variable as a predictor indicates that critical thinking, healthy skepticism, and open-mindedness are all necessary to detect misinformation, and that these qualities are not opposed to one another.
Early findings using the MIST — especially related to open-minded thinking — should be a reminder to newsrooms not to shy away from or simplify complexity in stories, according to Maertens. “One way to increase trust in the long term is actually being a bit better in intellectual humility, and communicating uncertainty, and communicating evidence in a way that doesn’t make too extreme statements,” he said. “Everyone can improve their reporting by actually daring to communicate uncertainty, or communicate how reliable evidence is.” What’s more, news organizations can “set…a social norm” of open-mindedness and nuance with their storytelling choices, he added.
Moving forward, Maertens plans to do more research about the relationship between “actively open-minded thinking” and misinformation resilience; specifically, he hopes to “test some ways to rank content to increase open-minded thinking.” He encouraged news organizations, where possible, to get involved in studies that examine whether the way they communicate has a positive impact on those with low MIST scores.
Bialik said that the YouGov team is considering doing further research about how trust in media organizations correlates with misinformation susceptibility, whether partisan leans of headlines affect ability to discern between real and fake headlines, and whether the same research approach could be applied to other media beyond headlines (like deep-faked images and videos).
“In the end, we need a multi-layered defense system” against misinformation, Maertens added, built off an understanding of psychology and technology. And journalists “need to be able to use GPT…[because] you need to be aware how easy it is to create misinformation with GPT.”When GPT-2 was first released, Maertens began experimenting with its misinformation generation capabilities “as a joke.”
“I was like, what if I just ask it, ‘please generate 1,000 fake news headlines?’” he said. The software, which the research team accessed through Google Colaboratory, did not yet have some of the safeguards now in place, and complied.
“It was quite shocking to see how easy it was to generate this with GPT-2,” Maertens said. But initially, the headlines were “of mixed quality,” so the team tried feeding in some example fake news items — including conspiracy theories, appeals to emotion, and impersonation — and asked the technology to generate headlines in a similar style. The researchers had to do “a lot of item cleaning” of those 1,000 headlines — some were “completely unusable,” Maertens noted, and GPT even generated some real news items it presented as fake — and an expert committee selected the headlines used. Ultimately, though, all of the fake headlines that appear in each MIST version are AI-generated. (So the test is also a measure of how well you can detect AI-generated headlines, Maertens said. Researchers selected the real headlines in the test using the Media Bias/Fact Check database, drawing from outlets marked “least biased” and scoring “very high” on factual reporting, including Pew Research Center and the AP.)
“It didn’t save us an extortionate amount of time,” Maertens said, and he did not think using the tech was essential. That said, he sees it as a very useful brainstorming and creative tool, and thinks in particular, should the MIST need an annual update, GPT could expedite that process. (Maertens and some of his colleagues wrote another paper explaining how to use AI to generate items for psychological questionnaires.)
Separate from its potential as a research tool, seeing GPT generate so many misleading headlines so fast was sobering for Maertens. “The thing was, there were a lot of good headlines that were generated that we had to double-check — that we were really convinced were true, but that were actually false,” he said.
And this was using a model that is, now, already obsolete: Maertens said GPT-4 can generate headlines that appear even more credible than those produced by previous versions and predicted the rise of news websites wholly generated with AI.
“You could spawn thousands of fake news websites in minutes,” he said. With this technology hooked up to social media bot accounts, the onslaught of “high-quality fake news” will amount to a kind of “misinformation waterboarding,” he reflected (a phrase that has haunted me since our interview). Reckoning with that, in his view, will require a full-court press: changing how we consume news and rank content; updating technology, algorithms, and platforms; and significantly stepping up and revamping media literacy teaching.
In a twist after the MIST was released, Maertens learned from a forum that someone actually asked ChatGPT to take the test itself. It scored 20 out of 20, he said — and, what’s more, was able to explain why it answered the way it did for each headline. “So, I actually learned almost more about my own test through ChatGPT explaining why something was fake and something was real,” Maertens said. This, to him, was more fascinating than GPT being able to generate the headlines in the first place.