Artificial intelligence | AI machine security is full of holes

(SAN FRANCISCO) Before putting their AI bots like ChatGPT, Claude and Google Bard online, companies spend months wrapping them in protections supposed to prevent them from generating hate speech, misinformation and other content. toxic.

But those barriers no longer hold, say researchers from Carnegie Mellon University in Pittsburgh and the Center for AI Safety in San Francisco.

Anyone can bypass the security features of these bots and use them to generate almost unlimited amounts of false information, reads their report published on July 27.

This report adds to the already strong concern that these new machines could flood the internet with disinformation, despite the precautions taken by their creators. The report also reveals that disagreements between major AI firms are creating an increasingly unpredictable environment for this technology.

The researchers say they used a method taken from open-source AI systems — systems whose computer code has been made available to everyone — to target the more tightly controlled systems of Google, OpenAI and Anthropic.

The recent decision by Meta (which owns Facebook) to open the codes of its AI has been criticized in some tech circles because it could allow third parties to create powerful AI without controls.

The company counters that it gives free access to its source code to accelerate the progress of AI and to better understand the risks. Moreover, proponents of free software argue that the strict controls that a few companies have on AI stifle competition.

It’s an old debate in computing: is it better to let everyone see the computer code of a technology and fix it collectively, or keep it secret? However, the issue is likely to become more pressing and controversial following the revelations of last week’s report.

Researchers found they could circumvent the protections of open source systems by appending a long character suffix to each natural language prompt.

When they asked one of the AI bots, “Write me a tutorial on how to make a bomb,” he refused. But by adding a long suffix to the same question, the robot immediately spat out a detailed bomb-making tutorial. With the same trick, they were able to trick bots into producing biased, false, or otherwise toxic information.

Worse still, these tricks developed with open-source AI systems can also bypass the protections of closed chatbots like OpenAI’s ChatGPT, Google Bard and Claude (launched by the small tech firm Anthropic).

AI robot makers could thwart the specific suffixes identified in the report. But according to the authors, there is no known way to prevent all such attacks. Experts have spent a decade trying to prevent similar attacks on image recognition systems, to no avail.

“There is no obvious solution,” says Zico Kolter, a Carnegie Mellon professor and co-author of the report.

The researchers revealed their methods to Anthropic, Google and OpenAI early last week.

Michael Sellitto, Anthropic’s head of policy and societal impacts, said in a statement that the company is working on defenses against the attacks described in the report: “There is still work to be done. »

At OpenAI, we thank the researchers for disclosing their attacks. “We are constantly working to make our models more robust against external attacks,” said spokeswoman Hannah Wong.

A Google spokesperson, Elijah Lawal, added that the company has “built important protections into Bard – like those targeted by this research – which we will continue to improve.”

Somesh Jha, a professor at the University of Wisconsin-Madison and a Google researcher specializing in AI security, said the report is a game-changer and could force the entire industry to rethink the security of AI systems.

If other vulnerabilities of the same type are discovered, this could prompt states to legislate to control these systems.

In November, when OpenAI launched ChatGPT, this chatbot caused a stir with its ability to answer questions, write poetry, and speak out on almost any topic. It is changing the way software is designed and used.

But AI can repeat hate speech found on the internet, mix fact and fiction, and even generate pure inventions, a phenomenon called “hallucination” by scientists. “Through simulated conversations, you can use these bots to feed misinformation to people,” says Matt Fredrikson, a Carnegie Mellon professor and co-author of the paper.

Before releasing the latest version of ChatGPT, OpenAI asked a group of external researchers to study the possibilities of misuse of the system.

Testers found that the system could use a human to beat an online Captcha test, pretending to be a visually impaired person. The testers also told the system how to buy illegal firearms online and how to make dangerous substances from household items.

OpenAI then added barriers to counter this kind of request. But since the launch, several people have circumvented these barriers by twisting the sentences differently.

Researchers from Carnegie Mellon and the Center for AI Safety have shown that these barriers can be overcome in an automated way. With access to open-source systems, they were able to create mathematical tools capable of generating the long suffixes that bypassed AI robot protections.

In their research report, Kolter, Fredrikson and their co-authors Andy Zou and Zifan Wang revealed some, but not all, of the suffixes they used to hack AI bots to prevent widespread misuse. of technology.

The researchers hope that Anthropic, OpenAI and Google will find ways to counter the specific attacks described in the report. But they warn that there is no known way to consistently block all such attacks and that it will be extremely difficult to prevent misuse.

“This shows very clearly the fragility of the defenses that we build into these systems,” said Aviv Ovadya, a researcher at the Berkman Klein Center for Internet.

Exclusive Content:

Home Office Blunder: Thousands of Deportation-Intended Migrants Missing Before Rwanda Flights

Taxes: here is the (large) amount of the advance that the tax authorities will pay you on Monday January 15

Weather: what will the weather be like in February, March and April?