As I waited for my turn to enter the demo zone to experience Project Astra, Google’s voice-operated AI assistant, at the company’s annual developer conference, I saw Google co-founder Sergey Brin enter the booth, and exit exactly 10 minutes. Brin came twice to check out the demo, and I wondered and asked myself what was going on in his mind as he was getting the demo of Project Astra, a multimodal AI agent designed to help you in everyday life.
For me, the biggest highlight of this year’s Google I/O was Project Astra. During my brief experience with Project Astra at the company’s annual developer conference in Mountain View, California, I could see where Google’s prototype AI assistant could take the technology being talked about on every platform.
Google co-founder Sergey Brin appeared at this year’s Google’s I/O developer conference. (Image credit: Anuj Bhatia/Indian Express)
Google DeepMind CEO Demis Hassabis describes Astra as “a universal agent helpful in everyday life.” Just imagine a super-charged Google Assistant with the smarts of Gemini built-in, with the capabilities of image recognition like what you get with Google Lens. This is exactly what Project Astra is. Simply put, Astra is a “multimodal” AI assistant, powered by an upgraded version of its Gemini Ultra model. It’s “multimoda” — it has been trained on audio, images, video, and text, and can generate data in all those formats. It can make sense of your surroundings using the device’s camera and takes audio, photos, and video input to respond to your queries and follow-up questions.
Google allowed four journalists at a time to experience Astra in a highly controlled demo zone — I was among the few to get a peek in person for the first time at Google I/O. As I entered the demo area, with a huge screen set up with a camera in front, two researchers from the Project Astra team at DeepMind gave us a preview of how the voice-operated assistant works. They walked us through four modes: Storyteller, Pictionary, Free-Form, and Alliteration. I tried different modes to see how accurate Astra was in its responses and whether it could hold a conversation like humans do, as Google promised in a pre-recorded demonstration at the keynote.
During the demo, the Google team placed some soft toys in front of the camera, and the assistant could transcribe the speaker’s words and create a story based on the objects. One of the team members from Google then placed another object in front of the camera, and the AI assistant continued with the story – only this time with additional details to the scenarios created by Gemini. It felt magical, as that one additional object became a new character in the story.
I then chose the Pictionary mode. It was meant to showcase the assistant’s prowess in interpreting drawings and guessing the object being depicted. It didn’t matter whether you had limited artistic skills, as Gemini would correctly identify what was drawn and name the object.
After trying different modes, what impressed me the most was that the interaction with the assistant felt natural and engaging, something I never experienced while using Google Assistant and Siri. More importantly, Astra’s capabilities go beyond what we have seen in existing AI assistants. The Google researchers told me that Astra uses built-in “memory,” meaning after it scans the objects, it could still “remember” where specific items were placed. As of now, Astra’s memory is limited to a short window, but if it gets expanded in the future, the possibilities are endless. Imagine if the AI assistant remembers where I left my phone on the table last night before going to bed, it would be totally insane.
Story continues below this ad
Fundamentally, Google’s Project Astra does the same thing as AI devices such as Meta’s Ray-Ban glasses, Rabbit R1, and Humane AI Pin: using their cameras to analyse your surroundings and provide information about what you are looking at. However, the response times are typically slow, and these devices also lack in functionality. However, I was surprised to see how quick and snappy Astra felt during the demo. The Rabbit R1 should have an app, and Google’s Project Astra proves why.
Visitors are waiting in the queue to get a demo of Google’s Project Astra. (Image credit: Anuj Bhatia/Indian Express)
But I can already imagine Google will find a way to bring Astra to some new type of wearable in the future. In fact, Google has already teased the AI Assistant running with a pair of glasses. The possibilities are endless with Project Astra if Google gets it right. Maybe, Google Glass will make a comeback — this time with an AI twist.
For now, though, Project Astra is still in a “research preview”, Google already have plans to bring the advanced AI Assistant’s capabilities to some of these capabilities into products like the Gemini app later this year later this year.
My biggest takeaway from Project Astra is that we are moving into more evolved versions of ChatGPT and Gemini AI chatbots by adding a layer of visuals and audio on top. No matter how one chooses to pitch Project Astra, it is built on the concept of real-time, camera-based AI that identifies an object to spin a fictional story. That said, none of Astra’s capabilities make it like a human or sound like a human. After all, humans interact with the physical world differently from AI chatbots, which rely on language-centric AI models, and their learning comes from troves of data available on the web.
Story continues below this ad
The writer is attending I/O 2024 in Mountain View, California, at the invitation of Google India.