Back in March, The Markup broke some news about a small-business advice chatbot being piloted by the New York City government. The bot — available at chat.nyc.gov — was meant to be an AI-powered, one-stop shop for entrepreneurs looking to set up a business. The city said the bot could tell visitors, in plain English, what permits they might need to start a company, for example, or to avoid fines and violations when they did.
It was heralded by the mayor’s office as the future of public services, but we found that the chatbot regularly lied to constituents, and its advice, if followed, would have caused people to break the law.
When asked, the NYC bot told The Markup that bosses could take workers’ tips (they can’t), that landlords could discriminate against tenants who wanted to use housing vouchers (wrong again), and that business-owners could refuse to take cash (businesses must take cash). Again and again, we found the bot messing up on city laws and regulations.
The story broke through nationally, becoming a cautionary tale about government agencies rushing to adopt AI without the proper safeguards in place. Mayor Eric Adams soon acknowledged at a press conference that the bot was providing false information, but defended the tool, saying it would improve over time. (The bot now seems to be willing to respond to fewer questions in general.)
One person who saw that story was Jonathan Soma, a professor in data journalism at Columbia University. Soma recently gave a series of video talks on how journalists can use AI to power their reporting.
One topic he touched on in the series: chatbots. Using the NYC chatbot as a starting point, he demonstrates in a video how to build a similar AI-powered chatbot that could scan and respond to questions based on uploaded documents.
In the video, it works — he specifically asks his own bot some of the same questions we asked the city’s bot, and he got accurate responses when asking about rules for landlords, for example. He also shows how a bot could respond based on just about anything else a person uploads, like a long text of Hungarian folktales.
Soma says chatbots may not be great for high-stakes legal advice, although he’s optimistic about its role in expanding data journalism: “When these bots hit the scene, everyone was terrified that it was going to replace the act of writing for journalists. But it turns out that even though it seems to be the best at writing, that is actually the task that AI is worst at: assembling information, turning it into something coherent.”
We caught up with Soma to talk more about how he built his own bot on the fly, and what responsible use of AI — especially for journalists — might look like. This interview has been edited for brevity and clarity.
But terrifying. You have a tool that is pitched as a useful tool to help people live their lives in a way that is most effective to obeying the law. Then it’s pitching them lies under the banner of, “This is sourced, this is true. You can trust this.” That’s the big problem with chatbots — they will cite sources, they will say facts that are not facts. Then maybe there’s a little asterisk down at the bottom that says,”Sorry, some things here might not be true, double check all of them.” Which is just absurd.
It is very easy to get a small proof of concept going. No matter what tool they used to generate this, I know exactly how it worked, there’s no mystery there. The mystery in these chatbots is how you can get them to be accurate. And it’s still a big problem that all of these chatbots have, and clearly this one as well.
It was very fun, again, as a proof of concept. But if someone was going to sit me down and make me defend a PhD thesis based on a Hungarian folktales chatbot, I don’t think I would pass.
I did that by running some code that assigned all the numbers and cut all the paragraphs and sent everything out to GPT to get that answer. There are probably hundreds of tools that do this. It’s very, very common. A lot of slight variations on the approach, but generally speaking, you’re finding relevant documents and sending them to the AI along with the questions.
But there were so many basic, basic questions that were answered incorrectly, I can’t imagine the documents were actually loaded. There must just have been gross malfeasance in terms of setting this chatbot up. Because it’s not hard to have it be right most of the time. The issue was getting it to be right 100 percent of the time. And from what I can tell, this city-based chatbot was almost never right, which is hard to do. It’s hard to be that wrong.
It is 100% guaranteed that, at some point, there’s going to be some sort of mistake in that chain, and there’s going to be some sort of error introduced, and you’re going to get a wrong answer. There’s almost no way to fact-check it. There are ways to make it slightly better and slightly better and slightly better and slightly better. But getting to 100%, I would say, is almost impossible.
(After publication of The Markup’s story, several disclaimers were added to the bot, including one cautioning users to “not use its responses as legal or professional advice.”)
When you’re using AI tools, you just need to test all of the time. It’s very easy to get into a conversation with AI and think, “Wow, this is amazing. I’m getting an incredible poem about my cats. I’m getting a recipe about this thing or the other thing. I’m getting feedback or edits on something I have written,” and it’s very easy for you to say, “Yes, I will accept this. No, I won’t accept that edit. Yes, this poem is adorable, or no, write another one about how the cat is mean bean instead of the cat is nice,” or whatever. So you just need to make sure that sort of capacity for errors is built in. Either you’re reviewing everything manually, or it’s a task where instead of doing one review of a piece of legislation, instead you’re looking at 100,000 pieces of legislation, or a million pieces of legislation, something that you as a human being couldn’t do. And you say, “OK, AI, you go do it, I understand that maybe 3 percent of the time, you’re gonna give me some incorrect information. But because I’m doing something at scale, maybe it’s okay that there’s errors here and there. All I want is the general gist to be true, or the general gist to be accurate.” But you can’t do that when you’re building a chatbot and every single conversation has to be meaningful, and has to be accurate for the person who is using it.
But journalists as a whole are very used to talking to people who are very confident, making statements that need to be fact-checked, that need to be cross-referenced. The step that we need to make as a society is moving from, “This came from a machine, it must be correct,” to, if I’m talking to a friend of mine who says something crazy, “I need to double check that, I need to cross reference it to make sure that it is accurate.”
This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.