So many pioneering works of digital journalism no longer exist online, or exist only as a shadow of their former selves.
The Guardian’s 2009 coverage of the MP expenses scandal, for instance, which included a massive crowdsourcing effort and hundreds of thousands of documents (a project we wrote up at Nieman Lab): The stories that anchored that coverage are nowhere to be found on theguardian.com.A lavish online multimedia experience built around a Pulitzer Prize-nominated work, which explored the legacy of a deadly 1961 bus-train collision in Colorado, from the now defunct Rocky Mountain News: Rescued from internet limbo, thanks to efforts of the former Rocky Mountain News staffer who reported the series. (Former Nieman Labber Adrienne LaFrance explored the path to resurrection and warned of the ephemerality of the internet in this 2015 piece for the Atlantic. “The Crossing” project is now at its own, stable URL, though someone still needs to maintain it.)
You get the picture.
“We were told the internet was forever. That was kind of a lie,” Meredith Broussard, currently a professor at the Arthur L. Carter Journalism Institute at New York University, told me. “I’ve been writing for the web since 1996 or so, and all of my early work is gone. It only exists in paper files in archival boxes somewhere in my apartment. Unless somebody is maintaining internet sites, they go away — and somebody needs to be paying the bill for the server.”
Broussard and colleague Katherine Boss, the librarian for journalism, media, culture, and communication at NYU, are working on a workflow and on building tools to help organizations effectively and efficiently preserve their big data journalism projects, and putting together a scholarly archive of data journalism projects.
“News apps can’t be preserved the same way you preserve the static webpage,” Broussard said. The Internet Archive’s Wayback Machine is dependable for finding a snapshot in time, but a searcher needs to know the time frame of what they’re looking for, and snapshots don’t really capture a complicated, database-driven project or any site with a lot of dynamic links. “The way to capture these is from the backend. You can grab the whole database — all of the images from the server side, and so forth. We’re looking to build server-side tools that will allow for automated, large-scale, long-term archiving of data journalism projects.”
Boss and Broussard’s first move has been surveying developers and journalists on the tech used to make and store their news apps. The preliminary survey returned a range of technologies, frameworks, and platforms: Flask, Django, Ruby, Node.js, d3, AWS, Heroku, and on and on.
“Nobody’s yet collected data on what we’re trying to learn. Where are all projects being stored? How are they built — are they pulling from external APIs? We’re trying to ask the right questions digital archivists will want the answers to, in order to build these sorts of tools,” Boss said. “There are great projects that are really just being lost — there is currently no way to archive or preserve them. It’s not really technically possible right now. We’re trying to develop new workflows, and not just within libraries, that could be used by anyone.” (There are efforts to catalog and tag all the interactives out there, such as the Interactive News Depot, but those, too, need to be painstakingly updated and maintained.)
It’s an issue Boss and Broussard have been following closely for a while — Broussard even wrote a response in The Atlantic to LaFrance’s story about “The Crossing.” The data journalist community has also been concerned with preserving interactive projects news apps for years. The Journalism Digital News Archive, part of the Reynolds Journalism Institute, has also been convening these researchers and journalists to tackle this problem. A recent Twitter thread, featuring some steadfast champions of news app preservation efforts, offers a taste of what some forward-thinking organizations are considering:
1/?
“The Wall” from @USATODAY is one of those large-scale immersive projects that USAT is so good at pulling off … https://t.co/YcESpOnM4n— Anthony DeBarros (@anthonydb) September 21, 2017
3/?
Congrats to all involved. Truly an accomplishment, and I can’t wait to dig into all the pieces. However …— Anthony DeBarros (@anthonydb) September 21, 2017
5/?
The piece is “Ghost Factories.” Won multiple awards for @alisonannyoung and our team … Eventually, I found it, somewhat broken …— Anthony DeBarros (@anthonydb) September 21, 2017
7/?
So, is the timer of obsolescence already ticking on The Wall? Will the glory of its achievement be forgotten in five years?— Anthony DeBarros (@anthonydb) September 21, 2017
(USA Today is aware of and working on the issue, it seems.)
IMHO we need publishing tools that self-archive. We've taken steps to do that with the bigbuilder tool behind https://t.co/9YbR6EnPTe
— Ben Welsh (@palewire) September 21, 2017
It's pathetic companies so proud of their print archives and industrial scale paper distribution don't take this seriously.
— Ben Welsh (@palewire) September 21, 2017
“But in the broader population, there isn’t much awareness that this is a problem,” Boss said. “And in libraries, this is a thing that librarians just recently have started thinking about. It’s on the cutting edge of research in digital archiving.”
In librarian/archivist nerd-speak, Boss explained, there’s “migration,” and then there’s “emulation.” “Migration” is the traditional stuff we might associate with libraries: digitizing print materials, digitizing microfilms, moving VHS to DVDs and then DVDs to Blu-Ray and then Blu-ray to streaming media. That process doesn’t make sense for digital “objects” like news apps that are dependent on many different types of software, and therefore have too many moving parts to migrate. What if, a hundred years out, we’re not even browsing the internet on computers, or at least not the computers we’re familiar with now? What’s needed is a way to capture a data journalism project from the server side and then “emulate” that whole environment on whatever future device is being used to view the project.
“We’re borrowing best practices from other fields, though there are also issues unique to data journalism,” Broussard said. She pointed to efforts in the scientific community like ReproZip, which can help with reproducing and archiving digital scientific experiments, and the similar challenges facing the contemporary art conservation around how to archive something like an interactive video installation or a piece of performance art.
“I like to talk about it as reading today’s news on tomorrow’s computer,” she said.