Anthropic drafts a ‘constitution’ for Claude as concerns over AI model behaviour grow

Controlling AI behaviour remains a profound challenge that is rooted in the fundamental unpredictability of large language models (LLMs).

Claude Opus 4.1 is Anthropic’s most advanced coding model to date. (Image: Anthropic)

Anthropic has published a new set of behavioural guidelines for its Claude AI models covering ethics and safety while also laying out guidance for a scenario in which an AI system becomes sentient.

The newly drafted ‘constitution for Claude’ serves as a final blueprint for shaping the AI models into what the company wants them to be. It provides a detailed description of Anthropic’s vision for the models’ values and behaviour, the AI startup wrote in a blog post on Thursday, January 22.

Anthropic has said that it plans on using the abstract ideals defined in the constitution at various stages of the training process of Claude models. However, the document will only be applicable for the company’s general-purpose Claude models, which means that its AI models for specialised use cases such as health and finance are excluded.

The move comes as several AI companies and research labs are looking to align the behaviour of AI models with human values, following a series of controversies triggered by unfiltered AI-generated responses from runaway models such as Elon Musk-owned xAI’s Grok.

However, controlling AI behaviour remains a profound challenge, rooted in the fundamental unpredictability of large language models (LLMs). In May 2025, Anthropic’s own safety testing found that its top AI model, Claude Opus 4, showed signs of blackmail and deception after researchers threatened to take it offline.

Acknowledging that Claude’s outputs might not always adhere to the constitution’s ideals, Anthropic said that it was a crucial part of the model training process, with its content directly shaping model behaviour. “…we think that the way the new constitution is written—with a thorough explanation of our intentions and the reasons behind them—makes it more likely to cultivate good values during training,” the company said.

What does the Claude constitution say?

Calling it a work in progress, Anthropic said that the constitution for Claude was drafted with feedback from various external experts as well as insights gained from developing previous model versions. “We’ll likely continue to do so for future versions of the document, from experts in law, philosophy, theology, psychology, and a wide range of other disciplines,” Anthropic said.

Story continues below this ad

In order to make existing Claude models safe and beneficial, Anthropic laid out the following principles:

– Broadly safe: The AI models should not undermine human mechanisms to oversee AI during the period of AI development. It should also prioritise safety above ethics.

– Broadly ethical: AI models must be honest and act in accordance to good values while avoiding actions that are inappropriate, dangerous, or harmful.

– Compliant: They should act in accordance with Anthropic’s guidelines and supplementary instructions to Claude about how to handle specific issues, such as medical advice, cybersecurity requests, jailbreaking strategies, and tool integrations.

Story continues below this ad

– Genuinely helpful: Claude models must be developed to provide immense value so that they are genuinely and substantively helpful. They should serve as “a brilliant friend who also has the knowledge of a doctor, lawyer, and financial advisor, who will speak frankly and from a place of genuine care and treat users like intelligent adults capable of deciding what is good for them.”

Also Read | Anthropic takes on OpenAI with ‘Claude for Healthcare’, its own offering for doctors and patients

The constitution also contains a section called Claude’s nature that discusses how the AI model should “approach questions about its nature, identity, and place in the world.”

Why do AI models need to have a constitution?

The constitution published by Anthropic is meant to serve as a ‘final authority’ on how Claude should behave. The model’s behaviour must be consistent with both the letter and underlying spirit of the constitution, it said.

Additionally, Anthropic said that Claude will rely on these constitutional ideals to generate synthetic training data, “including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses.”

Story continues below this ad

“All of these can be used to train future versions of Claude to become the kind of entity the constitution describes,” Anthropic added.

Loading Taboola...