Premium
This is an archive article published on October 3, 2024

OpenAI DevDay 2024: Four new features to make AI more affordable and accessible

With the new features, OpenAI seems to be focussing on empowering the developer community as well as smaller organisations that cannot afford larger computational capabilities.

Unlike the previous year, OpenAI’s latest developer conference did not feature any big ticket product launches. (Express Image/OpenAI)Unlike the previous year, OpenAI’s latest developer conference did not feature any big ticket product launches. (Express Image/OpenAI)

AI powerhouse OpenAI recently concluded its second edition of OpenAI DevDay, the company’s developer conference. While the OpenAI DevDay 2023 showcased a plethora of offerings such as the GPT-4 Turbo, Assistants API, and Custom GPTs among others, this year’s conference was more of a subdued event without major product launches. However, the event showcased some incremental upgrades and how the company aims to chart its future course. 

OpenAI showcased four innovations – Vision Fine-Tuning, Realtime API, Prompt Caching, and Model Distillation at the event. These tools will help developers in crafting compelling applications to stay afloat in the developer ecosystem. With OpenAI DevDay 2024, the Sam Altman-led company aims to empower developers. This is also a marked shift in strategy by the company at a time when big tech companies are getting increasingly competitive with their AI offerings.

Here’s a closer look at the innovative tools from OpenAI.

Realtime API

OpenAI introduced Realtime API in public beta. This tool allows all paid developers to create low-latency and multimodal experiences in their apps. Just like ChatGPT’s Advanced Voice Mode, the Realtime API offers natural speech-to-speech conversations with six presets such as alloy, echo, fable, onyx, nova, and shimmer. OpenAI also said that it will be introducing audio input and output in the Chat Completions API to back use cases that do not require low latency from the Realtime API. According to the company, developers can pass any text or audio inputs into GPT-4o and have the model respond using text, audio, or both, based on their choice. 

Essentially, this means developers can now add ChatGPT’s voice controls to apps. With Realtime API, users can now have engaging natural conversations with apps. While developers have been using voice experiences to connect with users, they had to work with multiple models to make it work. Realtime API coupled with Chat Completions API makes it easier for developers to bring voice experiences. OpenAI says that one can build natural conversational experiences with a single API call. 

The Realtime API is currently available in public beta to all paid developers. The audio capabilities in the tool is powered by the new GPT-4o model, gpt-4o-realtime-preview. The company said that the Audio feature will be released in the coming weeks as a new model named gpt-4o-audio-preview, which will allow developers to input text or audio into GPT-4o and receive responses in audio, text, or both. Realtime API uses both text tokens and audio tokens. While text input tokens are priced at $5 per 1M and $20 per 1M output tokens, audio input is at $100 per 1M tokens and output at $200 per 1M tokens. 

Vision fine-turning

Vision fine-tuning was one of the significant updates at the event. OpenAI has announced vision fine-tuning for GPT-4o, its most capable large language model. With this feature developers can customise the AI model’s ability to comprehend images and text. This update can benefit in areas such as autonomous vehicles, visual search, medical imaging, etc., in the long run. 

According to OpenAI, vision fine-tuning follows a similar process as fine-tuning text, meaning developers can now prepare image datasets to follow the proper format and later upload it to OpenAI’s platform. The company said that the feature can improve the performance of GPT-4o for vision tasks with as few as 100 images. It can also drive higher performance with larger volumes of text and image data.  

OpenAI cited Grab, a Southeast Asian food delivery and rideshare company that has used the technology to improve its mapping services. With just 100 examples, the company reportedly achieved a 20 per cent improvement in accuracy with respect to lane counts and 13 per cent boost in speed limit sign localisation. With this, the potential for vision fine-tuning is immense and can have a major impact on AI-powered services. 

Prompt caching

Story continues below this ad

Prompt caching is the main highlight of the OpenAI DevDay 2024. This new feature is aimed at reducing costs and latency to support developers. Many developers rely on the same context repeatedly across multiple API calls while building AI apps adding to the complexity of the process. “Today, we’re introducing Prompt Caching, allowing developers to reduce costs and latency. By reusing recently seen input tokens, developers can get a 50 per cent discount and faster prompt processing times,” OpenAI said in its official post. 

Prompt caching is applied on the latest versions of GPT-4o, GPT-4o mini, o1-preview and o1-mini, and fine-tuned versions of these models. OpenAI said that Cached prompts will be offered at a discount compared to uncached prompts. OpenAI has shared detailed pricing for the feature on its official website. 

The company said that just with all API services, Prompt Caching is subject to its Enterprise privacy commitments. According to OpenAI, Prompt Caching is a tool that allows developers to scale their applications in production while balancing performance, cost, and latency.

Model distillation

According to OpenAI, Model Distillation will offer developers an integrated workflow to manage the entire distillation pipeline from within the OpenAI platform. The feature lets developers easily use the outputs of frontier models such as o1-preview and GPT-4o to fine-tune and augment the efficiency of cost-efficient models like GPT-4o mini. This could likely benefit smaller organisations to leverage AI outcomes from advanced models without staggering computational costs. 

So far, model distillation has been a multi-step, error-prone process that required developers to manually do multiple operations across numerous disconnected tools. Owing to its iterative nature, developers needed to repeatedly run each step making the task more complex and painstaking. The new Model Distillation makes it effortless as it simplifies the process of fine-tuning smaller, and cost-efficient models using outputs from larger models such as GPT-4o and o1-preview. The new feature lets developers create high-quality datasets by the help of real-world examples allowing them to distil larger models into smaller versions. Model Distillation is available to all developers.

The announcement at the latest OpenAI DevDay shows a significant strategic shift by endowing more features aimed at the developer ecosystem. With the latest announcements, OpenAI seems to be focussed on making its products cost-effective, supporting the developer ecosystem, and turning the spotlight on model efficiency. With these updates, the AI powerhouse seems to aim at reducing resource intensity and environmental impact. 

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement