Premium

OpenAI’s low-cost, open-weight AI models are here. But are they truly ‘open’?

The move marks a shift in OpenAI’s strategy which, in recent years, has focused on building proprietary, closed AI models. The rethink was largely prompted by the DeepSeek mania earlier this year.

The last time that OpenAI introduced an open-weight model was in 2019 when it had launched GPT-2. (Photo: Reuters)

OpenAI has released new open-weight models for the first time in six years. The two new AI language models — gpt-oss-120B and gpt-oss-20B — can run on personal devices such as a laptop, and be fine-tuned for specific purposes.

The launch of open-weight AI models comes after multiple delays owing to safety concerns. In a blog post on Tuesday, OpenAI said, “We’re excited to provide these best-in-class open models to empower everyone — from individual developers to large enterprises to governments — to run and customize AI on their own infrastructure.”

The last time that OpenAI introduced an open-weight model was in 2019 when it had launched GPT-2. Since then, the company has largely focused on building proprietary, closed foundational AI models. The latest change in its strategy seems to be triggered by the release of Chinese startup DeepSeek’s cost-effective, open-weight R1 model earlier this year.

OpenAI is also expected to release the highly anticipated GPT-5 model this week.

What do we know about the two new AI models?

While the larger gpt-oss-120B model is capable of running on a single 80GB GPU (Graphics Processing Unit), its lightweight version, the gpt-oss-20B, can be deployed on a laptop or any other edge device with 16GB of memory.

Both models have been released under a permissible Apache 2.0 license. This means that developers can freely download the weights of these models on Hugging Face and host them in their own environments. Microsoft will also be deploying a GPU-optimised version of the gpt-oss-20B model on Windows devices.

The gpt-oss-120B model has a total of 117 billion parameters, activating 5.1 billion parameters per input token. The gpt-oss-20B has a total of 21 billion parameters, with 3.6 billion parameters activated per token.

Story continues below this ad

The parameter count of an AI model indicates its size and roughly corresponds to a model’s problem-solving skills. This means that models with more parameters generally perform better than those with fewer parameters.

However, OpenAI said that it made the gpt-oss, transformer-based models more efficient by using a technique known as mixture-of-experts (MoE). DeepSeek’s models also utilise an MoE architecture, which makes AI models more energy efficient and reduces computation costs by activating only a small fraction of their parameters for any given task.

To improve inference and memory efficiency of the AI models, OpenAI said it used a technique known as grouped multi-query attention. This is slightly different from the multi-head latent attention technique introduced in the DeepSeek V2 model.

Both the gpt-oss models natively support a maximum context window of 128,000 tokens.

Story continues below this ad

Also in Explained | What are artificial intelligence and machine learning?

How do the gpt-oss models compare to OpenAI’s frontier models?

The gpt-oss-120B model matched the performance of o4-mini, OpenAI’s most advanced frontier AI model, in terms of reasoning skills, general problem-solving skills, competition coding, and tool calling. It is also better than o4-mini in responding to health-related queries as well as competition mathematics.

The gpt-oss-20B model matched or exceeded o3-mini on these same benchmarks. It outperformed o3-mini on competition mathematics and health.

Notably, both gpt-oss models tend to hallucinate more than o3 and o4-mini. “This is expected, as smaller models have less world knowledge than larger frontier models and tend to hallucinate more,” OpenAI said in a previous white paper.

How have these models been trained?

The two gpt-oss models were trained using reinforcement learning (RL) and other techniques also employed by OpenAI to develop its advanced reasoning models such as o3.

Story continues below this ad

As for the data used to train these models, OpenAI only disclosed that it fed the models a mostly English, text-only dataset, with a focus on STEM, coding, and general knowledge. Harmful data related to Chemical, Biological, Radiological, and Nuclear (CBRN) topics was filtered out of the dataset.

Training data has become one of the most contentious issues in the AI industry. It is also one of the distinguishing elements between open-source and open-weight AI models.

According to the non-profit Open Source Initiative (OSI), a truly ‘open-source’ AI model is considered to be one that provides developers access to the source code, model architecture, weights, training procedures, and training data under a licence that allows developers to freely download, modify, and distribute the models.

However, most AI companies are wary of disclosing training data details as they risk facing copyright infringement lawsuits and other legal action.

Story continues below this ad

As a result, companies like OpenAI have taken a middle-ground approach to AI development, making the weights of these AI models publicly available. The weights of an AI model are analogous to knobs on a DJ set that can be continuously adjusted such that the outputs of an AI model become consistent with the patterns in its training dataset.

Developers who are able to access these weights can fine-tune the models without having to retrain them on new data.

In the post-training stage, the gpt-oss models were run through supervised fine-tuning and RL cycles similar to the development process for o4-mini. “During post-training, we used deliberative alignment⁠ and the instruction hierarchy to teach the model to refuse unsafe prompts and defend against prompt injections,” it said.

What steps has OpenAI taken to make these models safe?

One of the main risks posed by open-weight AI models is its potential for malicious use. Since the key components of the AI model are freely accessible, external developers can easily build a version of it without the native safeguards that prevent the model from generating hateful jokes or misbehaving in other ways.

Story continues below this ad

However, OpenAI claimed that its open-weight AI models performed on-par with its other frontier models on internal safety benchmarks. To prevent bad actors from fine-tuning the gpt-oss models for malicious purposes, OpenAI said that it carried out an extra round of safety testing using an adversarially fine-tuned version of gpt-oss-120B.

“We directly assessed these risks by fine-tuning the model on specialised biology and cybersecurity data, creating a domain-specific non-refusing version for each domain the way an attacker might,” the company said.

Despite ‘robust fine-tuning’, OpenAI said that the gpt-oss models did not show high risk for misuse under its Preparedness Framework⁠. The results of this safety evaluation were also reviewed by three independent experts whose recommendations were later adopted by the company.

OpenAI has further announced $500,000 prize money as part of a Red Teaming Challenge to help identify safety issues with its open-weight models.