Journalism of Courage
Advertisement
Premium

How Grok praising Adolf Hitler reveals a deeper AI problem

Elon Musk’s AI chatbot Grok has riled users again by embarking on a wide-ranging, hateful rant. But underlying the controversy are questions of not just how AI chatbots work but also what shapes their behaviour and to what extent it can be aligned with human values.

9 min read
From a technical standpoint, controlling AI behaviour remains a profound challenge, rooted in the fundamental unpredictability of LLMs.From a technical standpoint, controlling AI behaviour remains a profound challenge, rooted in the fundamental unpredictability of LLMs. (Express photo)

Billionaire businessman Elon Musk’s company xAI apologised on Saturday (July 12) for the “horrific behaviour” of its AI chatbot Grok lately. Incorporated into the social media platform X since 2023, several users recently managed to goad Grok into posting antisemitic and abusive tirades, including praise for Adolf Hitler.

Grok suggested last week that Hitler would be best-placed to combat supposed “anti-white hatred” as he would “spot the pattern and handle it decisively.” It also said that people with Jewish surnames were responsible for “extreme anti-white activism”, among other responses that users criticised as a troubling consequence of Musk’s push to develop “anti-woke” AI, free from liberal values.

“After careful investigation, we discovered the root cause was an update to a code path upstream of the @grok bot. We have removed that deprecated code and refactored the entire system to prevent further abuse,” the company said, adding that the underlying large language model (LLM) powering Grok was not affected.

This is far from the first time that Grok has gone haywire. A few months ago, it offered unsolicited information about claims of a genocide of white people in South Africa, which many see as a conspiracy theory. Its behaviour has previously stirred controversy in India over responses containing expletives, Hindi slang, and misogynistic slurs, eventually drawing the attention of the IT Ministry in March this year.

From a technical standpoint, controlling AI behaviour remains a profound challenge, rooted in the fundamental unpredictability of LLMs.

Why is it hard to guarantee AI behaviour?

First, this isn’t just a Grok problem. A clear trend of inconsistency is noticeable when it comes to AI-generated outputs. Google’s Gemini has faced user backlash for generating historically inaccurate images, while X users trended ‘#ShameonMetaAI’ after it was used to generate jokes on Hindu gods.

Given the power and growing influence of AI, most people have come to think of LLMs as possessing a form of human-like intelligence due to their ability to generate complex, coherent, and conversational language. However, a now-famous academic paper published in 2021 (“On the dangers of stochastic parrots: Can language models be too big?”) equates the current crop of generative AI tools to stochastic systems.

Story continues below this ad

At their core, these systems are built on “synthetic text extruding machines”, according to Emily M Bender, a professor of linguistics at the University of Washington and co-author of the paper.

“These systems aren’t “saying” anything per se, but rather outputting sequences of words based on their training data,” Bender told The Indian Express.

The probabilistic nature of LLMs also explains why users rarely receive the exact same output twice. Even if the input token is the same, the model could sample a slightly different output token from a probability distribution at any step. Once a different token is sampled, then the subsequent tokens will also be generated based on this new sequence. This will essentially lead the model down an entirely different path of text generation.

There are two major sources of uncontrollability of system output. The first has to do with system design. According to Bender, today’s chatbots achieve great apparent fluency by mimicking word use patterns in very large datasets. In simpler words, LLMs are what they eat.

Story continues below this ad

There are bound to be some abhorrent patterns in the training data if the collection of these datasets during the pre-training stage “wasn’t done with care”, she said. Or, if it was done under the illusion that including all the bigotry found on the internet is somehow a more “objective” or “unbiased” approach,” the researcher added.

None of the companies developing LLMs, including xAI, actually release the data on which these models are trained. But what data is being used by chatbots when answering questions in real-time also matters.

For instance, Grok has explicitly been instructed to use data on X to answer questions. This is one of the reasons why it reflects the opinions of users on the platform, particularly of Elon Musk. Users found that Grok 4 appears to consult social media posts from Musk’s X account when answering questions about controversial topics such as abortion, immigration laws, the Israel-Palestine conflict, etc.

The second source of uncontrollability of system output is user context.

Story continues below this ad

During the training stage, a model’s parameters are adjusted so that it learns to perform tasks well. Once the training is complete and the model is deployed, these parameters are fixed. The only type of learning that the model performs after being deployed is in-context learning.

“Even if the system is constrained to output only a few fixed phrases, users can always set up a context, deliberately or not, that will make the output harmful,” Bender said. It is also why LLMs cannot reliably be used as information-access systems, she added.

How can AI outputs be controlled once a model is deployed?

On Saturday, xAI said it removed a set of hard-coded instructions that caused Grok to veer off course. Despite the changes, users said the newly released Grok 4 version of the chatbot continues to return antisemitic propaganda.

In traditional software, fixing a bug might involve writing code or rolling out an updated module. However, LLMs are not typically modified once deployed. Instead, most adjustments that are needed to settle down a controversy likely happen at the surface level.

Story continues below this ad

“The deeper it is in the pipeline, the deepest being pre-training data, then post-training and query time, the more sticky and harder it is to change the model’s behaviour,” Nirant Kasliwal, an AI/ML engineer and founder of software development company ScaledFocus, told The Indian Express. “In the case of Grok, as it is deployed right now, my best guess is that they really don’t make any changes to the base model at all,” he said.

Here are a few band-aid solutions to realign an AI chatbot’s behaviour in the wake of user backlash.

Hard-coded conversations: AI developers can explicitly program LLMs to provide predetermined responses to user questions such as “What model are you?” or “Who built you?”. By adding a few hundred such conversations in the training data and fine-tuning it, the model can be expected to parrot these responses after being deployed. But bad actors can still bypass these protections by asking questions in a certain way, which is also known as jailbreaking.

Brute-force blocking: In 2015, Google integrated its Photos app with image recognition software that identified black people as gorillas. The search giant sought to “fix” this issue by preventing the software from recognising any actual gorillas. But taking such an action would essentially strip away the generative qualities of an AI model that make it unique.

Story continues below this ad

Modifying system prompts: Developers or prompt engineers can exert some control over a model’s personality and knowledge base by inputting system prompts that will remain constant across multiple user interactions. These prompts can be modified under the hood to inject more diversity in the training data. However, these built-in guardrails can be over-ridden regardless of the system prompts in place.

Changing RL rewards: Reinforcement learning (RL) comes under the umbrella of LLM post-training. It encourages the model towards solutions that lead to correct answers in a reward-based system. If the problem does not have a concrete answer, the model is subjected to reinforcement learning from human feedback (RLHF). This essentially involves humans giving the AI model feedback on which answers are good and bad. While changing the reward function can significantly impact model behaviour, there have been instances of AI systems finding loopholes to game the process.

Red teaming: In order to boost the safety and security of AI models, tech companies can run more red-teaming exercises in response to threat actors exploiting vulnerabilities in LLMs and using them to generate harmful content. Red-teaming essentially involves attacking the model or prompting it with well-known jailbreaks to reveal its weaknesses.

To be sure, none of these approaches can guarantee that an AI model behaves perfectly. The challenge of controlling and aligning AI behaviour with human values has drawn renewed focus this year from researchers and startups alike.

Story continues below this ad

New research has further shown that narrowly fine-tuning an AI model can lead to misaligned behaviour. This may help explain why Grok, in particular, is prone to these blowups as it is designed to be “maximum truth-seeking” AI.

Tags:
  • artificial intelligence Elon Musk Explained Sci-Tech Express Explained Express Premium
Edition
Install the Express App for
a better experience
Featured
Trending Topics
News
Multimedia
Follow Us
Express PremiumThe pose and the page: Inside the world of performative reading
X