When Sarah Ryley, an investigative reporter at the New York Daily News, filed a records request with the city’s police department early last year, she didn’t expect to hear back for a while, nor did she expect the eventual data to be very useful. Ryley, working with ProPublica, was investigating how often the city evicted residents over alleged violations of its nuisance abatement law, which gave it the power to shut down businesses and residences that were being used for illegal purposes.
Many months passed before Ryley heard back from the city, whose data didn’t help her investigation, so she decided to start the digging herself. She pored though thousands of pages of New York State Supreme Court filings, entering details of over 1,100 cases, one by one, into a spreadsheet. The process took Ryley and her research partners weeks to complete, but by the end of it the team had compiled a dataset that was one of a kind.
It wasn’t long before many lawyers and housing rights advocates emailed Ryley in an effort to get their hands on the data themselves. “What we had was something really valuable and unique,” she said. “There’s no way you could have gotten it all from just filing document requests.”
ProPublica saw the same potential. In May, the organization uploaded the 287 KB file to the ProPublica Data Store, a repository that lets outside parties purchase and download cleaned-up, processed versions of the datasets that it uses in its investigations. ProPublica launched the store two years ago, eyeing the product as a new, experimental way to turn its data into new revenue. While the organization initially downplayed Data Store’s revenue potential, dataset sales bought in $30,000 in the first five months and have totaled over $200,000 since launch. Revenue so far this year has topped the last two years combined; its free datasets have been downloaded more than 4,500 times.The Data Store’s success, despite being a “highly under-resourced experiment,” has inspired ProPublica to expand on the model, said Celeste LeCompte, ProPublica’s director of business development (and a former Nieman Fellow). Building on its first dataset sales partnership with the New York Daily News, ProPublica says that it’s also now working with Investigative Reporters and Editors (IRE) to manage sales of some of the datasets produced by the National Institute for Computer-Assisted Reporting (NICAR). Its also working on similar arrangements with a handful of public radio stations, though the exact details on those deals haven’t yet been finalized.
“This is something that I’ve wanted to do with Data Store since I started here,” said LeCompte, who has spent most of her first year at ProPublica focused on the effort. “So many news organizations have these kinds of datasets, but don’t have the resources to build out a marketing channel or sales support system to do anything with them. We want to offer that support.”
With a more ambitious Data Store, which is also getting a redesign, ProPublica will continue to court media organizations that are sitting on big datasets they’d like to sell. Not all of the datasets are good fits, however: The best candidates, LeCompte said, are those that news organizations have used in their own investigations. It also helps if the datasets are built from multiple sources and are those journalists have “done a ton of work on,” which would encourage potential customers to buy rather than attempt to replicate them themselves.
For partner organizations, the most immediate benefit to working with ProPublica is the potential new revenue dataset sales can generate. Selling valuable proprietary datasets is a relatively low-effort way for investigation-heavy news organizations to bring in new incremental revenue off assets that would have sat dormant otherwise. Datasets sold on Data Store can run from as little as $200 up to $10,000 or more, depending on the work involved in producing the data and whether the customer is from another news organization, academia, or a large company. For the cheaper datasets, ProPublica can take as much as a 50 percent cut of the sales, though that, too, varies based on the extensiveness of each dataset.
“For a lot of these, no one is going to make the largest amount of money, but it’s not something anyone is going to sniff at when you’re in the economic environment that publishers are in right now,” LeCompte said.
To date, ProPublica has uploaded roughly 50 datasets focused on, for example, how much money pharmaceutical companies have paid to doctors, Medicare prescription data for hepatitis C drug spending, and partial disability compensation for injured workers. Nonprofit organizations, law firms, and big companies are often willing to pay top dollar for these kinds of datasets, largely because the data they offer is either often difficult to obtain or is the result of many hours of manual data entry. Cleaned up, well formatted healthcare data from the likes of the Centers for Medicare & Medicaid Services and the Centers for Disease Control and Prevention, while publicly available, are still valuable to a wide variety of organizations in the sector — including hospitals, pharmaceutical companies, independent consultants, and even tech companies. Yelp, for example, inked a deal with ProPublica last year to use the organizations’s Surgeon Score data to include ER wait times, nursing home fines, and noise levels on the site’s hospital listings. ProPublica has even managed to sign subscription deals with certain clients, giving it some recurring revenue.
Beyond generating new revenue, the data store is also an effort to bolster the impact and shelf life of the data produced by news organizations, whose service missions drive them to get their data in as many hands as possible. “If organizations are making internal and external decisions based on the data we produce, that’s a really radical form of journalism,” said LeCompte.
On the other hand, news organizations fueled by the desire to get the investigations in front of the maximum possible number of people might balk at pricing datasets so high that people can’t access them. Serving both the revenue and mission goals represents a tricky balance, which is why the store prices datasets depending who is buying them. Nonprofit and news organizations, more sensitive to price, will always pay less than academic initiations and big companies, which tend to have bigger wallets.
“Getting people comfortable with asking for the value that goes into the work involved in creating these is going to be a learning curve for a lot of newsrooms,” said LeCompte.