The new model finally gets text right in images. (Express image) OpenAI, the artificial intelligence powerhouse behind ChatGPT, has allegedly been testing a new version of its generative AI imaging model over the past few months. Early samples leaked by YouTuber MattVidPro show that this model performed significantly better than other image generators, including Midjourney, which is considered the leading model in terms of realism.
In the video, the YouTuber states that the model blows anything we’ve seen before out of the water: “Midjourney cannot compete at this level – I don’t even think Midjourney version six would be able to compete at this level.”
A photorealistic sample. (Image: MattVidPro/YouTube)
The video then proceeds to present a series of examples demonstrating the AI’s capabilities. From generating ridiculous yet realistic images of animals made of cheese to reproducing famous logos like ‘Snickers’ and ‘Subway’ with superb accuracy, the AI’s prowess is on full display.
The model is also highly versatile with art and creates it in various styles including classic paintings and retro advertisements. Moreover, it supports image generation in multiple aspect ratios, including 16:9.
The ability to get the text right in its images is particularly astounding though. Most generative image models available today tend to have a rather poor ‘understanding’ of text and often produce words with distorted alphabets when prompted to. This shortcoming limits users’ ability to use the models to produce logos and marketing material. However, OpenAI’s purported upcoming model could solve this problem.
A perfectly-generated GTA 5 PS4 CD box. (Image: MattVidPro/YouTube)
At this point, it’s unclear how OpenAI plans to go about with the alleged model’s future public access or what the company plans to call it. But the YouTuber speculates that it’s likely an upgrade to the present Dall-E 2 model and may therefore launch as Dall-E 3. The unpublished model is currently being tested on an invite-only basis and Matt says that only around 400 people worldwide have access.
When it does launch for the public, chances are that it will be after significant changes. MattVidPro warns that the model is not ready for public use yet, as it lacks the safety measures that are implemented on Dall-E 2. It can generate inappropriate content, such as nudity, which could violate OpenAI’s ethical standards.
Nevertheless, the video gives a glimpse of what the next version of Dall-E could offer. Dall-E 2 has been lagging behind other models like Midjourney in terms of realism, but this new model can help close the gap.