Constitutional AI: Anthropic's ethical AI framework explained

When departing members of OpenAI launched an AI-focused startup, it sent shockwaves through Silicon Valley. When the company, Anthropic, revealed their powerful chatbot, Claude, those waves increased. The company attracted a stunning amount of investment capital, climbing into the billions during its first two years. Claude may not be Anthropic’s most significant contribution to the field. The company is also responsible for a less flashy but more substantive initiative called Constitutional AI. Some hope this initiative will shape the future of AI safety.

AI has struggled to find arelevant foothold on mobile devices, but it continues to reach into our lives, from homework and legal research to driving thebest smart speakers. Safety constraints are a growing concern for developers and consumers. What is Constitutional AI, and why is it important to the future of artificial intelligence safety?

A man sitting at a desk with a monitor displaying the Anthropic website

Constitutional AI adheres to human values

Constitutional AI is a set of tools and techniques that ensure AI closely aligns with human values, making sure it’s helpful, harmless, and avoids deceptive practices. Anthropic developed and trademarked the concept of Constitutional AI, and the company created a Constitution for large language models. This Constitution ensures that AIs like Anthropic’s Claude adhere to human values the Constitution’s authors clearly defined, rather than learning or inheriting values during its training and evolution due to processing huge amounts of data.

What is Anthropic?

Flash in the pan, or the future of artificial intelligence?

How does Constitutional AI work?

In a document published on the company’s website called Claude’s Constitution, Anthropic described how it defined AI values by having humans select the best of multiple outputs from the AI. This generated preferences that the AI used to bias future results, ideally making the system more accurate and helpful.

This approach had several shortcomings and would be difficult to scale as AI became more complex, faster, and sophisticated. It also meant that human handlers would often deal with disturbing outputs from the AI.

ChatGPT home screen on a mobile phone.

To improve this process, Anthropic shifted the monitoring and regulating behavior to the AI. Constitutional AI trains an AI on several core principles around avoiding toxic, harmful, or discriminatory outputs, avoiding helping people engage in illegal activities, and focusing on developing AI systems that are ethical and helpful.

The model is introduced to these guiding principles and then fed examples. The AI is tasked with analyzing and refining its responses to align with the constitutional principles. Through reinforcement learning, where the AI is rewarded for appropriate outputs and penalized for outputs that violate those principles, the AI develops a policy that shapes future behavior and responses.

A chart comparing bias across nine social dimensions between three AI models

The future of Constitutional AI

Anthropic stated that a primary reason the company was founded was to pursue AI safety research. Judging from the litany of reports the company is continually publishing on its site, that philosophy seems to be bearing up. Some of the research they’re conducting is extremely interesting and practically valuable.

What are large language models?

Large language models (LLMs) are the basis for AI chatbots and much more. Here’s what’s going on behind the scenes

A great example is a paper Anthropic published in October 2023 detailing the results of an experiment with Collective Constitutional AI. Anthropic invited around 1,000 participants to submit ideas about what should be included in an AI constitution. It compared the survey results with the company’s internal constitution. Researchers then trained an AI on the Collective Constitution and compared it to results from training on the internal constitution.

The public constitution included some notable differences from Anthropic’s internal constitution, like an emphasis on providing responses that balanced all sides of an issue or debate or responses that would be most accessible to people with disabilities.

Anthropic trained two of its smaller Claude Instant AI models. One on the Collective Constitution and one on its internal constitution. It found that the models performed similarly in math and language understanding tasks, shared similar political ideologies, and were as helpful and harmless as one another. However, the public-trained model exhibited less bias across nine social dimensions in abias benchmark, as illustrated below.

Bluff or beacon?

The Collective Constitution experiment demonstrates Anthropic’s commitment to exploring and refining its AI safety protocols. Whether Constitutional AI will see broad adoption remains to be seen. With heavyweights like Amazon and Google committing billions of investment dollars to Anthropic’s vision, the company may be the best positioned to establish universal safety constraints for artificial intelligence.

Safety isn’t the only concern. Generative AI has hallucinations and can offer incorrect information. Learn toidentify AI hallucinations, and you won’t be fooled by false data.

Constitutional AI adheres to human values#

What is Anthropic?#

How does Constitutional AI work?#

The future of Constitutional AI#

What are large language models?#

Bluff or beacon?#