When it comes to automating the process of spotting breaking news, solving one problem can create several more.
Reuters discovered this firsthand over the past two years as it built Reuters News Tracer, a custom tool designed to monitor Twitter for major breaking news events as they emerge. While reporters curate their own lists of sources to get rapid alerts on stories they’re already looking for, the Reuters tool is designed to solve a different problem: detecting breaking news events while early reports are still coming in.
The development of the tool, which Reuters is speaking about publicly today the first time, emerged out of “an existential question for the news agency,” said Reg Chua, Reuters’ executive editor of data and innovation. “A large part of our DNA is built on the notion of being first, so we wanted to figure out how to build systems that would give us an edge on tracking this stuff at speed and at scale. You can throw a million humans at this stuff, but it wouldn’t solve the problem,” he said.
Once the tool identifies what it thinks are emerging stories, it clusters relevant tweets into events, generating information, and metadata about what that story might be about. Tweets that mention “explosions” and “bombs,” for example, would be clustered into a single story about a potential terrorist attack.
But detection is only the first, and probably easiest, problem to solve. Another challenge was figuring out how to identify which events are actually interesting, newsworthy, and not spam. Added to that is the problem of filtering out assertions of opinions (“I think it’s terrible that this event happened”) from assertions of facts (“This event happened”) and automating the processing of verifying whether reports are actually true.
The verification challenge was the most interesting and most valuable problem to solve, Chua said. Pulling from academic research on the verification of social media reports, Reuters designed its algorithm to assign verification scores to tweets based on 40 factors, including whether the report is from a verified account, how many people follow those who reported the news, whether the tweets contain links and images, and, in some cases, the structure of the tweets themselves. “Amazingly enough, a tweet that is entirely in capital letters is less likely to be true,” Chua said.
The factors, when combined, give each story cluster a score. If those stories meet a set verification threshold, Reuters has enough confidence to tweet out its own breaking news alert reporting the event, and reporters will then report on the story themselves. That score will also change over time, as more reports come in and increase or decrease the verification ranking. The process, which drew on historical datasets and insights from Reuters reporters, essentially reverse-engineered the process of spotting and verifying unconfirmed reports, combining the the skills of reporters with the speed of an algorithm. Reuters essentially taught its algorithm to think like a reporter.
The nature of Reuters News Tracer means that the tool works best for what Chua calls “witnessable events,” such as bombings and natural disasters. These are events often reported by many people simultaneously, often joined by photos and videos, which boost their legitimacy in the eye of the tool’s algorithm. The tool has already paid off in significant ways for Reuters, giving the organization, for example, an eight-minute head start on reporting on the Brussels bombings earlier this year and a 15-minute jump on sending out a news alert on the Chelsea bombing in New York in October.
This kind of speed is a big deal, not only to people whose lives could be saved by getting rapid notifications of nearby events, but also to Thomson Reuters financial clients, “who really prize breaking news,” said Chua.
But while speed is a key element of the mission of Reuters News Tracer, the tool also serves to extend the reach of the organization overall, particularly because Reuters is unable to have reporters on the ground in every place where news might break.
“With the proliferation of smartphones and social media, it means that there are lot more witnesses to a lot more events,” said Chua. “We can’t be at everything. Our tool helps shift some of the burden of witnessing and lets journalists do much more of the high value-added work.”