What’s in a byline? For Hoodline’s AI-generated local news, everything — and nothing

What’s in a byline? For Hoodline’s AI-generated local news, everything — and nothing

None of the AI writers seems to have a specific beat, except possibly for what can be best described as “police exploits,” which they all cover with gusto.

By Neel Dhanesha June 3, 2024, 3:32 p.m.

When Hoodline, a company that runs hyperlocal news sites for cities across the country, first launched, it sounded promising. The site’s approach to combining data with granular, on-the-ground reporting got the attention of my former Nieman Lab colleagues back in 2015, when it was focused on San Francisco. We wrote about it again in 2018, when it reinvented itself as a largely automated local news wire. Back then, Shan Wang wrote for us, the people who ran Hoodline stressed that “robots are definitely not going to take over human journalists’ work.”

With a neighborhood-level focus, Hoodline is trying to fill a gap in San Francisco news

October 14, 2015

That seems to have changed. In the past few weeks, Bloomberg, CNN, and the San Francisco Chronicle published stories about how Hoodline is heavily leaning on AI to generate its stories. “Some of the stories produced by Hoodline closely resemble those from other outlets,” Chase DiFeliciantonio wrote in the Chronicle, though the CEO of Hoodline argued those stories were doing much the same as human news aggregators. If you go to the Hoodline website now, you’ll find a small “AI” badge next to practically every byline; in my research for this story, I found just two active bylines — Cheryl L. Guerrero and Steven Bracco — that seem to belong to real people, at least based on the existence of social media pages and personal websites. The dozens of others are AI.

Hoodline is by no means the only publisher to use AI to pump out content; CNET, Sports Illustrated, and plenty of others have used fake writers to fill their sites. Hoodline, meanwhile, only started adding the AI badges after another San Francisco outlet, Gazetteer SF, asked about the company’s use of AI.

What stands out to me about Hoodline is the names.

The front page of Hoodline San Francisco is filled with articles by “Leticia Ruiz,” “Nina Singh-Hudson,”¹ “Eileen Vargas,” “Eric Tanaka,” and “Tony Ng.” A diverse-sounding bunch, much like the population of San Francisco itself! A shame, then, that they’re not real. Until recently, Bloomberg noted, each of those bylines was accompanied by a headshot and bio. (Singh-Hudson’s bio said she was a “long-time writer and a Bay Area native.” I suppose the code that generated that bio very well might have been written nearby.)

As Maggie Harrison Dupré reported for Futurism, Sports Illustrated and The Street have both also generated names and photos of people of color who never existed. At Sports Illustrated, for example, the name “Sora Tanaka” appeared next to a photo from an AI headshot marketplace under the listing “joyful asian young-adult female with long brown hair and brown eyes.”

Zack Chen, the CEO of Impress3, the company that owns both Hoodline and the local news site SFist, told me via email that Hoodline’s AI personas were generated at random by AI, and that their beats and cities were also randomly assigned. The fact that they give the impression of being people of color, he told Bloomberg, was unintentional. And, Chen stressed to me, the AI is not operating on its own: “We have a team of dozens of (human) journalist researchers who are involved with information gathering, fact checking, source identification, and background research, among other things,” he wrote. Those researchers pitch stories, which are then analyzed by an AI “for viability and compatibility with our readership in that beat” before a human editor provides a final sign-off.

After spending some time looking through many of those stories, however, I have questions.

I sifted through each Hoodline city site, jotting down the names of writers and clicking on stories to see if they were written by AI. Each city has somewhere between 3 and 6 personas bylining stories, and I immediately started wondering about the supposed randomness of the names: On the Hoodline site for Boston, where 13.9% of residents reported being of Irish ancestry in the 2022 census, “Leticia Ruiz” and “Eric Tanaka” give way to “Will O’Brien” and “Sam Cavanaugh.” Memphis, Tennessee, has six AI personas: “Alicia Freeman,” “Andre Washington,” “Bob Norris,” “Sofia Garcia-Jones,” “Caleb Powell,” and “Elena Nguyen.” On the Washington, D.C. site, one of the personas — “Mike Johnson” — shares a name with the speaker of the House of Representatives.

(After publication, Chen wrote “We instructed [the tool generating names] to be randomized, though we did add details to the AI tool generating the personas of the purpose of the generation. In other words, it was prompted to randomly select a name and persona for an individual who would be reporting on – in this case – Boston. If there is a bias, it is the opposite of what we intended.”)

None of the AI writers seems to have a specific beat, except possibly for what can be best described as “police exploits,” which they all cover with gusto. Hoodline’s sites are filled with stories about both arrests and police PR events, which makes me think Hoodline’s AI tools are leaning heavily on the press releases of local police departments.

Take a look, for example, at Hoodline Dallas, which published a story about a sergeant’s retirement on June 2. That day, the same persona, “Nate Simmons,” generated a story about law enforcement officials being praised for their work on an “unspecified case.” (That case was, in fact, specified; the details were in the image posted by the Tarrant County DA, which whatever AI tool Hoodline is using apparently couldn’t parse.) Many stories are simply about police chasing or arresting suspects, often accompanied by mugshots pulled from law enforcement social media posts or press releases. There’s little, if any, original reporting; in perhaps the saddest example of Hoodline’s aggregation I’ve found yet, a recent story about the closure of a restaurant in San Francisco linked to a Hoodline story from 2014, back when the site first launched.

If there are humans involved before the AI gets to work, it doesn’t seem as if they’re checking the work that the AI produces very carefully. I found errors and what seem to be straight-up hallucinations in multiple articles — one recent story about a community event the Boston Police Department held with a group of senior citizens says the BPD’s press release “underscored an ongoing effort to foster a sense of partnership between law enforcement and residents,” but the four-sentence press release it linked to made no mention of any such ongoing efforts. A story about an arrest described the suspect’s heart “racing to the beat of a fugitive drum — only this drum was a ghost gun, with no traceable past, its loaded magazine its immediate concern.” Literary qualities aside, the AI seemingly fixated on a small detail in the press release that stated the suspect was “breathing heavily” and ran with it.

(After publication, Chen argued the mistake in the first story was “a punctuation error” and not a hallucination. Hoodline has corrected the error. With regards to the second story, he wrote that the article “used significant storytelling methods that are – perhaps – uncommon for news stories, but it is not by definition inaccurate. We test out many things to see what improves readership and user experience and will continue to do so ongoing.”)

Even if we ignore the factual errors for a moment, what I couldn’t understand is why Hoodline chose to give its AI personas human-sounding names at all. Why not, for example, simply byline each story with something like “Hoodline San Francisco”? Chen told me that’s because he envisions the personas evolving into short form news. “Like an AI news anchor,” he wrote. “These inherently lend themselves to having a persona…It would not make sense for an AI news anchor to be named ‘Hoodline San Francisco.'”

There’s something particularly insidious about this. The news business is already overwhelmingly white, but you wouldn’t know it from looking at AI bylines or the videos Chen envisions. The simple fact of that AI persona’s existence — the presentation of a person of color delivering the news — will, in itself, be a distortion of the truth, a hallucination of a more diverse news corps. AI-generated writing, and in particular fake bylines, are a facsimile of a facsimile, and that will only become truer as more and more media companies strike deals with companies like OpenAI, opening the door to easily aggregated content with no original reporting; an ouroboros of press releases.

Despite all the issues, Hoodline is at least transparent about using AI: along with the badges next to names the site has an AI disclaimer, even though the way it claims to use AI (“assisting in the background,” with a “human-centric approach”) is clearly false. That’s better than Sports Illustrated and CNET. Chen also claims the hope is that using AI to generate stories will help build revenue to hire human journalists who can take on more complex stories, though I wouldn’t hold my breath.

When Chen told me human journalists were involved in Hoodline’s editorial process, I asked if there was a masthead — standard practice at most publications — where I could find the names of some of those people, rather than clicking on stories to see which bylines did not contain the AI badge, as I did to find Guerrero and Bracco. “We do not have such a masthead at this point,” Chen wrote. “But it is a good idea to build that out in the near future.”

This story has been updated with additional comment from Hoodline.

Collage of AI bylines from Hoodline made by Laura Hazard Owen, a real human and editor of Nieman Lab.

“Gonna be thinking about Nina all day,” a colleague said to me in Slack. “Is she biracial or did she marry someone outside of her (presumably) Punjabi culture?” [↩]

Neel Dhanesha is a staff writer at Nieman Lab. You can reach Neel via email (neel_dhanesha@harvard.edu), Twitter (@neel_dhan), or Signal (@neel.58).

POSTED June 3, 2024, 3:32 p.m.

SEE MORE ON Reporting & Production

Show tags

TWITTER FACEBOOK EMAIL