This is an archive article published on September 29, 2023
ChatGPT gets image recognition: 6 wild things people are using it for
From breaking down complex diagrams with remarkable skill to producing code from images alone, here are some examples of how people are using ChatGPT's new vision feature.
Written by Zohaib Ahmed
New Delhi | Updated: October 1, 2023 08:39 AM IST
4 min read
Whatsapp
twitter
Facebook
Reddit
OpenAI's ChatGPT is now more human-like than ever. (Image: Zohaib Ahmed/The Indian Express)
Listen to this article
ChatGPT gets image recognition: 6 wild things people are using it for
x
00:00
1x1.5x1.8x
When ChatGPT first came out, people were flabbergasted at its remarkably human-like understanding of queries and the way it responded to those. The AI chatbot became an overnight sensation and was all over social media. In fact, global Google searches for the term “artificial intelligence” reached an all-time high, demonstrating the intense consumer interest in the technology.
But people move on quick. And yet just when the hype seemed like it was fizzling out, OpenAI dropped a couple of new update bombs introducing the ability to ‘see,’ ‘hear,’ and browse the web. The vision feature is particularly impressive, as ChatGPT can now analyse images with a level of detail that almost seems beyond human capabilities.
Naturally, people started to talk about ChatGPT again, and below we have compiled some of the best examples of how people have used the new image recognition feature.
Understanding complex diagrams
Diagrams are used to better represent complex information, but what happens when the diagrams themselves are too convoluted? ChatGPT’s new image capabilities come to the rescue, breaking those down in a language that can easily be grasped even by a toddler. For instance, one Twitter user was able to get the AI chatbot to explain an image packed with a flow diagram comprised of hundreds of elements.
ChatGPT image recognition vs "Crazy Pentagon PowerPoint Slides:"
It works the other way around too. If you need additional context or notes for a simple diagram/flowchart – or simply want to figure out what it’s even about – ChatGPT does an excellent job at it as well.
One Twitter user uploaded a screengrab from the movie Gladiator asking ChatGPT its source and what the person in the scene is saying. The chatbot answered like it had actually watched the movie, not only responding to the original query but also topping it off with additional context.
It remains to be seen if the feature works for random shots from movies as well or if it’s limited to popular scenes. But regardless, the tool can come in super handy for reverse image searches, especially when combined with its ability to browse the web.
Story continues below this ad
Interpreting memes and concepts
You either get it or you don’t. Understanding viral memes is sometimes impossible if you are missing the context. Or maybe the post is just too nonsensical or cliche for you to find the humour. If you can’t for the life of you figure out why a meme has received hundreds of thousands of likes, ChatGPT can help.
Yes, tools like Google Lens and Microsoft’s Visual Lens exist, but things can sometimes get lost in translation. ChatGPT can come in helpful as a substitute when an attempt to translate text on a hoarding, road sign, shop board, or anywhere else returns gibberish.
Yesterday I uploaded this photo of an article about a friend of mine into ChatGPT-4, and it was able to create a flawless English translation of the article from Italian.
But perhaps the most impressive application for the feature is its ability to figure out the code for websites and other projects – from screenshots alone – and replicating it accordingly. For example, a user uploaded a screenshot of a SaaS dashboard and ChatGPT produced the complete code for it. Upon checking if the code worked, the developer was astonished to see it indeed got most things right.
I gave ChatGPT a screenshot of a SaaS dashboard and it wrote the code for it.
Of course, it’s not even been a full week since ChatGPT gained the ability to see, hear, and speak, so it’s fair to assume that these use cases only scratch the surface. People are continuing to experiment with different types of inputs and there’s probably a host of cool new applications waiting to be discovered.
Zohaib is a tech enthusiast and a journalist who covers the latest trends and innovations at The Indian Express's Tech Desk. A graduate in Computer Applications, he firmly believes that technology exists to serve us and not the other way around. He is fascinated by artificial intelligence and all kinds of gizmos, and enjoys writing about how they impact our lives and society. After a day's work, he winds down by putting on the latest sci-fi flick. • Experience: 3 years • Education: Bachelor in Computer Applications • Previous experience: Android Police, Gizmochina • Social: Instagram, Twitter, LinkedIn ... Read More