Nieman Foundation at Harvard
HOME
          
LATEST STORY
Journalists fight digital decay
ABOUT                    SUBSCRIBE
Feb. 22, 2024, 12:27 p.m.

With The New York Times suing Microsoft and OpenAI for copyright infringement (a case the Times might well win, AI writer and researcher Timothy B. Lee and Cornell professor James Grimmelman argued this week), it’s a good time to take a look at how news sites in general are responding to tech companies’ use of their content. A report out Thursday from the Reuters Institute for the Study of Journalism finds that nearly half (48%) of the top news publishers across 10 countries were blocking OpenAI from crawling their sites as of the end of 2023.

The websites of legacy print publications (like The New York Times and Der Spiegel) were more likely to block AI crawlers than TV and radio broadcasters or digital-born news sites — 57% of them were doing so, according to Richard Fletcher’s research.

News websites were less likely to block Google’s AI crawler than OpenAI’s, with a little less than a quarter doing so, but “almost every website (97%) that decided to block Google’s AI crawler was also blocking OpenAI’s crawlers.” From the report:

The proportion of top online news websites blocking OpenAI ranged from 79% in the US, to just 20% in Mexico and Poland. For Google, the proportion blocking their AI crawler ranged from 60% in Germany to 7% in Poland and Spain. In general, outlets in the Global North were more likely to be blocking than those in the Global South. (Interestingly, the figures are aligned with attempts to index countries in terms of AI capabilities and preparedness, such as those published by Tortoise and Oxford Insights, both of which rank the US first.)

In every country apart from Germany, where the figure was 60% for both, more top news websites blocked OpenAI’s crawlers than Google’s. Moreover, almost every website that blocked Google AI also blocked OpenAI (97%). This could be because ChatGPT is more prominent and widely used than Bard/Gemini, or it could be because the OpenAI crawler was released first. But it is also possible that publishers are more cautious about blocking Google in case it affects their prominence in search results — even though there are separate crawlers for search and AI.

You can read the research here.

Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
Journalists fight digital decay
“Physical deterioration, outdated formats, publications disappearing, and the relentless advance of technology leave archives vulnerable.”
A generation of journalists moves on
“Instead of rewarding these things with fair pay, job security and moral support, journalism as an industry exploits their love of the craft.”
Prediction markets go mainstream
“If all of this sounds like a libertarian fever dream, I hear you. But as these markets rise, legacy media will continue to slide into irrelevance.”