When prompted with a complex question, the S1 model breaks it into multiple responses to analyse and respond. (Image: FreePik)In January, the world witnessed Chinese AI startup DeepSeek set off a revolution with its cost-efficient, state-of-the-art AI models. The company unveiled two models, DeepSeek-V3 and DeepSeek-V1, that rivalled the performance of frontier models by OpenAI and Google, and that too at a fraction of the cost used by big tech. DeepSeek has paved the way for more prudent innovations in AI. Now, a new model has sparked curiosity in the AI community. Researchers at Stanford and the University of Washington have trained a reasoning model named S1 at a meagre $50 (around Rs 4,400) in cloud compute credits.
Based on the research paper, the model S1-32B is an open-source advanced language model that is focused on reasoning tasks. What sets it apart from other AI models is its ‘test-time scaling,’ a technique that allows it to iterate its responses by dynamically using additional computational resources during testing. Reportedly, S1 directly competes with OpenAI’s o1 reasoning model, as it generates answers to prompts by thinking through related questions, which also allows it to check its own responses. This method is different from the traditional approach that solely relies on training large language models beforehand.
For example, if you prompt the model to explain how much the cost is to replace iPhones with Android tablets, it will break down the question into several steps, which could include checking how many use iPhones today and how much it would cost to manufacture Android tablets.
The S1 model has been trained by curating a high-quality dataset named S1K, which consists of 1,000 carefully selected questions. These questions were selected based on their difficulty, diversity, and quality. The dataset also includes complex problems from mathematics, reasoning, and science. Another key aspect of the model’s development is supervised fine-tuning (SFT) on this small data set. SFT, according to the research paper, only required 26 minutes of training on 16 NVIDIA H100 GPUs. Regardless of the small dataset size, S1 achieved high reasoning accuracy owing to its use of the knowledge embedded in a pre-trained base model that is Qwen2.5-32B-Instruct.
S1 is also based on an off-the-shelf language model that has been trained to reason by studying questions and answers from Google’s Gemini 2.0 Flash Thinking Experimental. The Google model shows the thinking that goes behind every answer process, which allowed the developers of S1 to endow their model with a smaller amount of training data—the 1000 curated questions with answers. They essentially taught the S1 model to imitate Gemini’s thinking process.
When it comes to performance, S1 has been evaluated on three reasoning benchmarks—AIME24, MATH500, and GPQA Diamond. During the tests, the model showed significant improvements in accuracy and outperformed OpenAI’s closed-source model, O1 Preview. The S1 model showcased a performance increase of up to 27 per cent on math competition problems. While earlier models needed reinforcement learning and massive datasets, the S1-32B showed that effective training with only 1,000 samples can build competitive reasoning models.
The S1 model shows the importance of transparency and open-source contributions in AI development. With s1’s development process now available in public, the researchers hope for more collaborations and innovation in this field. The researchers also showed the need to overcome limitations of test-time scaling, suggesting the need to explore alternative budget-forcing methods and apply reinforcement learning techniques to further enhance the reasoning capabilities.
In short, S1 is a breakthrough model that brings together efficient training, innovative test-time scaling, and open-source principles.