Shortly after 9:30 a.m. on March 12, 2014, two apartment buildings in East Harlem exploded when a water main collapsed into a gas line. Eight people were killed and dozens more were injured.
Journalists rushed to the scene to cover the tragedy, but four newsrooms — The New York Times, BuzzFeed, Gothamist, The New York World — had another tool to help them cover the explosions: CityBeat, a program designed to algorithmically search geotagged social media posts to find news stories in New York City. CityBeat was built by researchers at Cornell Tech, Cornell’s applied sciences outpost in New York City, and Rutgers and was being tested by the four outlets at the time.
Social media posts about the building collapses appeared on CityBeat, but by the time there were enough posts to register in its algorithm, the news organizations themselves already knew about the explosion and had reporters and photographers on the scene.
“[The Harlem Fire] did show up, but it was half an hour later…at that point we’re not using Instagram,” One of the journalists interviewed by the researchers said in their paper on the project.
CityBeat, the participants said, was most useful in covering planned events — conferences, concerts, events, or even PR stunts, such as when a man in a bear suit was spotted walking around Manhattan. The tool was less effective for covering realtime breaking news stories.
“That’s of the things that we talked about in the limitations and understanding the biases of the information,” Raz Schwartz, one of the study’s authors, told me. “Social media data might not be the best way to find these breaking events.”
Schwartz now works on the user experience research team at Facebook, but conducted the study as part of his postdoctoral research at Cornell along with Cornell professor Mor Naaman and Rannie Teodoro from Rutgers. The research was funded by the Brown Institute for Media Innovation at Columbia, and Schwartz presented the paper last week at a conference in Oxford, England.
Though the researchers have moved onto other topics, CityBeat is still live. The site was designed to be shown on big screens in newsrooms and has three main components. There’s the Detected Events List, a compilation of events the algorithm has discovered in the past 24 hours using Instagram data. There’s also the Event Window, which shows specific events and their location within New York. The third element is a sidebar showing statistics on the rate of tweets, popular hashtags, and more.
To detect news events occurring around New York, the CityBeat algorithm examines geotagged Instagram data. If it notices a number of photos posted from one location, it’ll create a Candidate Event, which includes all the photos taken from that location that caused the alert. Once a Candidate Event is created, it’s automatically sent to Amazon Mechanical Turk workers to ensure that it’s actually a newsworthy event and not, say, a lot of people posting pictures of themselves visiting the Empire State Building. But this approach was “problematic,” the authors wrote in the paper.
“In many instances Amazon Mechanical Turk workers would get confused by the number of different photos that appeared and would classify actual events as noise,” the study says.
Algorithms and bots have become more commonplace in news as of late. The Associated Press and the Los Angeles Times now use bots to write certain stories; apps such as SmartNews use algorithms to sort through millions of URLs to display stories for its users; and of course there’s Facebook which can alter publishers’ fortunes with a tweak of its News Feed recipe.And while the newsrooms that tested the platform weren’t sold on its utility for covering breaking news, Schwartz said he believes there are lessons to be learned from the CityBeat experiment about what role algorithms can play in covering breaking news.
“This is something that we see everywhere,” said Schwartz, referring to the increased use of editorial algorithms. “It’s growing and growing, and we have to understand what it means. We have to understand what happens when we give algorithms the reign of news selection and news making.”