All those annoying April Fool’s pranks you’ll see Monday might help researchers better detect fake news

Plus: A revived hoax on social media leads to attacks on Roma in France, Facebook bans white nationalism, and how “Suspected Human Trafficker, Child Predator May Be in Our Area” became the most-shared Facebook story of 2019.

By Laura Hazard Owen March 29, 2019, 12:42 p.m.

The growing stream of reporting on and data about fake news, misinformation, partisan content, and news literacy is hard to keep up with. This weekly roundup offers the highlights of what you might have missed.

April Fool’s. On Monday morning, be prepared for this year’s assortment of April Fool’s Day pranks across the web, the day companies’ (and publishers’) frustrated standups get their chance to shine. (Here’s a collection of 215 different April 1 pranks from 2018 alone; this year’s will be collected here. This year, Microsoft is taking a firm stand against them.)

But aside from some momentary confusion before people have their coffee, April Fool’s pranks also allow researchers to compare what false narratives look like when intended to be pranks versus when they’re intended as disinformation. That’s what Edward Dearden and Alistair Baron of Lancaster University have done in a soon-to-be-released paper titled “Fool’s Errand: Looking at April Fools Hoaxes as Disinformation through the Lens of Deception and Humour.” From the press release:

They have discovered that there are similarities in the written structure of humorous April Fools hoaxes — the spoof articles published by media outlets every April 1st — and malicious fake news stories. The researchers have compiled a novel dataset, or corpus, of more than 500 April Fools articles sourced from more than 370 websites and written over 14 years.

“April Fools hoaxes are very useful because they provide us with a verifiable body of deceptive texts that give us an opportunity to find out about the linguistic techniques used when an author writes something fictitious disguised as a factual account,” said Edward Dearden from Lancaster University, and lead author of the research. “By looking at the language used in April Fools and comparing them with fake news stories we can get a better picture of the kinds of language used by authors of disinformation.”

From analyzing those inputs, the researchers were able to build a sorting algorithm that tried to identify whether a given story was real news, fake news, or an April Fool’s hoax or prank: “The classifier achieved a 75 percent accuracy at identifying April Fools articles and 72 percent for identifying fake news stories. When the classifier was trained on April Fools hoaxes and set the task of identifying fake news it recorded an accuracy of more than 65 percent.”

So what are the points of comparison they found? Compared to real news stories, April Fool’s stories:

Are generally shorter in length
Use more unique words
Use longer sentences
Are easier to read
Refer to vague events in the future
Contain more references to the present
Are less interested in past events
Contain fewer proper nouns
Use more first-person pronouns

Compared to real news stories, fake news stories:

Are shorter in length
Are easier to read
Use simplistic language
Contain fewer punctuation marks
Contain more proper nouns
Are generally less formal — use more first names such as ‘Hillary’ and contain more profanity and spelling mistakes
Contain very few dates
Use more first-person pronouns

That last bit “goes against traditional thinking in deception detection, which suggests liars use fewer first-person pronouns,” they say.

Dearden is a PhD student studying natural language processing of online misinformation; you can see a talk he gave earlier this month on “alternative fakes” (including the April Fool’s research and analysis of postings in Flat Earth Society forums) here.

Speaker: @eddearden
Title: "Alternative fakes" https://t.co/L0irdjxXM5

— FORGE: Forensic Linguistics RG, Lancaster Uni (@FORGE_LU) March 13, 2019

Roma in France were attacked after the re-emergence of a longtime online hoax. From Agence France-Presse:

Police arrested 20 people after attacks on Roma on Monday night in suburbs north-east of Paris that followed false rumors spread on messaging apps and social media warning of kidnappings.

“We are calling today on the interior ministry…for immediate protection by way of round-the-clock police presence,” Anina Ciuciu, a spokesperson for the Voice of Roma group, told RMC radio.

She said the reports were “a revival of the medieval stereotype” of Roma in which “Gypsies are likened to thieves and child-catchers.”

The attacks appear to have been sparked by the re-emergence of a longstanding online hoax in which people warn of a white van being used in attempted kidnappings of young women or children.

#FakeNews | 🗨️ Une fourgonnette blanche circule entre les villes de Nanterre et Colombes pour enlever des jeunes femmes.
❌ Suite au partage de cette rumeur sur les réseaux sociaux, 2 personnes ont été injustement accusées puis lynchées.
Ne relayez plus cette fausse information. pic.twitter.com/gQVz6wLNcZ

— Préfecture de police (@prefpolice) March 25, 2019

CNN notes that “Under French law, you can be fined €45,000 to €135,000 ($51,000 to $153,000) for circulating fake news on the internet.”

Libération’s CheckNews.fr writes about the “indestructible” rumor of somebody in a white van abducting children, which has been popping up online in various countries since at least 2009 but has been around in some form for much longer than that.

Spotting the next misinformation campaign early. On Wednesday, Oxford Internet Institute director Philip N. Howard wrote an op-ed in The New York Times calling for a change in tech companies’ policies around advertising: requiring them to “put all of their ads in a publicly accessible archive.”

Social media firms want to regulate themselves, and Google has threatened to withdraw all political ads in Canada if it finds transparency rules too onerous. Facebook offers political ad archives in a few countries, and searching by hand is laborious. Independent researchers can investigate trends computationally, but Facebook, Twitter and Google are doing more and more to restrict access. There is negligible access to Instagram, where huge volumes of Russian-origin misinformation now flows. Banning political ads or creating partial ad archives in some countries won’t strengthen the world’s democracies. Ad bans give incumbent politicians an unfair advantage, and establishing partial ad archives gives political ad buyers an incentive to not declare their ads as political…

The solution is to have all technology companies put all ads, all the time, into public archives.

Existing efforts in this area have focused on explicitly political actors, which leaves out “foreign governments and shady lobbyists” who won’t disclose themselves as such. To really see what’s happening, he argues, we need all ads, past and current, in one giant searchable archive.

A comprehensive archive of all ads would create a record of all such misinformation campaigns that could be used to prevent them in the future. The people and groups behind these ads aren’t going to volunteer the details about them on their own. A fully searchable public archive, maintained by an independent ad council financed by a fraction of ad revenues, will give democracy a healthy shot of algorithmic transparency.

Then on Thursday, Facebook took a big step in that direction by broadening the kinds of ads it will include in its public archive:

Today, we’re updating the Ad Library to make it easier to learn about all ads and the Pages that run them. This includes:

Bringing in all active ads any Page is running — not just ads related to politics or issues — creating one place to see ads on Facebook. This was previously only visible on a Page in the Info and Ads section.

Adding more information about Pages in the Ad Library, including:

Page creation date, previous Page merges, and name changes.

Primary country location of people who manage a Page, provided it has a large audience, or runs ads related to politics or issues in select countries.

Advertiser spend information for ads related to politics or issues where the Ad Library Report is currently available. This includes all-time spend and spend over the last week, which was previously only available in the Ad Library Report.

Improved search in the Ad Library. Past user searches will be saved for people logged in to Facebook, and people can now search by Page, not just keywords.

The ability to report ads from within the Ad Library.

This move also works to resolve disputes news organizations had had with Facebook over their own ads sometimes being labeled “political advertising” because they promote stories about political topics. Now that all ads will be in the archive, publishers’ ads “will no longer be labeled as related to politics or issues.”

It’s progress, but there’s still room for improvement — most pressingly, making public to whome ads are targeted, not just their contents:

You can now search Facebook's archive of political ads programmatically; they opened up the API.

It's a good start, though there's still a lot more that we need to be able to see — like the targeting choices that are the subject of yday's HUD suit.https://t.co/W8N6TmQSRm

— Jeremy B. Merrill (@jeremybmerrill) March 29, 2019

5/ That now-blocked project was how we collected 100,000+ political ads. Perhaps unsurprisingly, more than 70 percent of all the political ads we saw were highly targeted by ideology: https://t.co/LSh71l3X0f

— ProPublica (@ProPublica) March 29, 2019

Facebook bans white supremacism and white nationalism. It’s a change in policy for Facebook, which had previously banned white supremacism but not white nationalism/separatism. Facebook announced the new policy this week. From Motherboard:

“We’ve had conversations with more than 20 members of civil society, academics, in some cases these were civil rights organizations, experts in race relations from around the world,” Brian Fishman, policy director of counterterrorism at Facebook, told us in a phone call. “We decided that the overlap between white nationalism, [white] separatism, and white supremacy is so extensive we really can’t make a meaningful distinction between them. And that’s because the language and the rhetoric that is used and the ideology that it represents overlaps to a degree that it is not a meaningful distinction”…

Fishman said that users who search for or try to post white nationalism, white separatism, or white supremacist content will begin getting a popup that will redirect to the website for Life After Hate, a nonprofit founded by ex-white supremacists that is dedicated to getting people to leave hate groups.

One year in, Facebook’s big algorithm change has spurred an angry, Fox News-dominated — and very engaged! — News Feed

March 15, 2019

“Exactly how this news stub went mega-viral is a mystery no one has quite solved.” Will Oremus looks at “Suspected Human Trafficker, Child Predator May Be in Our Area” — Facebook’s most-shared story of 2019, and one we wrote about briefly here — and how it got so popular.

While Facebook couldn’t confirm exactly what aspects of its algorithm helped the story on its way, Savage’s crime brief appears to have ticked nearly every box that the social network is trying to prioritize. First, it was shared by a local news organization, making it more likely that people in the Waco area would see it at the top of their feeds. Second, it generated large numbers of comments, which Facebook counts as “meaningful interactions.” Finally, its sharing was driven heavily by individual Facebook users, rather than by professional publishers with large followings, which means that it would be helped along by the company’s focus on surfacing posts from “friends and family first.”

But the wild card may have been the story’s headline. While it was clear from reading the story that it was about Waco and Central Texas, the headline just said the predator was in “our area.” Anyone who read the headline without reading the story might reasonably have thought the story was about their area, even if they were far from Texas. When I mentioned that possibility to Savage, he agreed, and said he’s usually more careful to localize his stories in the headline as well as the text.

Illustration from L.M. Glackens’ The Yellow Press (1910) via The Public Domain Review.

Laura Hazard Owen is the editor of Nieman Lab. You can reach her via email (laura@niemanlab.org) or Bluesky DM.

POSTED March 29, 2019, 12:42 p.m.

SEE MORE ON Audience & Social

PART OF A SERIES Real News About Fake News

Show tags

TWITTER FACEBOOK EMAIL