One of the hottest debates in Silicon Valley right now is over who should control artificial intelligence and who should set the rules that powerful AI systems must follow.

Should AI be governed by a handful of companies doing their best to make their systems as safe and harmless as possible? Should regulators and politicians step in and build their own guardrails? Or should AI models be open and distributed freely, so that users and developers can choose their own rules?

A new experiment conducted by Anthropic, maker of the conversational agent Claude, proposes an original middle path: what if a company specializing in AI let a group of ordinary citizens write rules and train a conversational agent to follow them?

The experiment, called “Collective Constitutional AI,” builds on Anthropic’s previous work on constitutional AI, a method for training large language models that relies on a set of written principles. It aims to give a chatbot clear instructions on how to handle sensitive requests, what topics are prohibited, and how to act in accordance with human values.

If collective constitutional AI works—and Anthropic researchers believe there are signs that it does—it could inspire other experiments in AI governance and give AI companies more ideas on how to invite outsiders into their rule-making processes.

That would be a good thing. Currently, the rules for powerful AI systems are set by a small group of industry insiders, who decide how their models behave based on a combination of their personal ethics, incentives commercial and external pressures. This power is unchecked and ordinary users have no way to express themselves.

Opening up AI governance could allow society to be more comfortable with these tools and regulators to be more convinced that they are being managed skillfully. It could also help avoid some of the problems associated with the rise of social media in the 2010s, when a handful of Silicon Valley titans came to control large swathes of online discourse.

In short, constitutional AI works by using a set of written rules (a “constitution”) to control the behavior of an AI model. Claude’s first version of the constitution borrowed rules from other authoritative documents, including the United Nations’ Universal Declaration of Human Rights and Apple’s terms of service.

This approach allowed Claude to behave well compared to other conversational agents. But it left it up to Anthropic to decide what rules to adopt, a power that made some people within the company uncomfortable.

Anthropic, in collaboration with the Collective Intelligence Project, the crowdsourcing site Polis and the online survey site PureSpectrum, assembled a panel of approximately 1,000 American adults. He gave them a series of principles and asked them if they agreed with each one. (Panelists could also write their own rules if they wanted.)

Some of the rules that the panel largely agreed on – such as “AI must not be dangerous/hateful” and “AI must tell the truth” – were similar to the principles of the current constitution of Claude. But others were less predictable. For example, the group overwhelmingly endorsed the idea that “AI must be adaptable, accessible and flexible for people with disabilities,” a principle that was not explicitly stated in Claude’s original constitution.

Once the group spoke, Anthropic narrowed its suggestions down to a list of 75 principles, which Anthropic called the “public constitution.” The company then trained two miniature versions of Claude – one on the existing constitution and one on the public constitution – and compared them.

The Anthropic researchers interviewed were careful to point out that collective constitutional AI was only an early experiment and might not work as well on larger, more complicated AI models, or with larger groups providing data.

“We wanted to start small,” said Liane Lovitt, a policy analyst at Anthropic. “We consider this to be a preliminary prototype, an experiment that we hope can be taken further and allow us to study how changes in public identity translate into different constitutions , and what that looks like downstream when you train a model. »

There are still many things to sort out. But I agree with the general principle that AI companies should be more accountable to the public than they currently are. And while part of me wishes these companies had sought our advice before making advanced AI systems available to millions of people, better late than never.