Premium

This is an archive article published on May 15, 2024

At Google’s developer conference in California, AI breathes new life to familiar platforms

At the Google I/O 2024, AI integration took centre stage, transforming products like Google Assistant and introducing the advanced generative AI model, Veo.

Google CEO Sundar Pichai delivering keynote at IO (Image credit: Anuj Bhatia/The Indian Express)

Google kicked off its annual developer conference Tuesday in Mountain View, California, with a more aggressive artificial intelligence strategy incorporated into its most popular products. Alphabet’s Chief Executive Officer Sundar Pichai had a strong message for the thousands of developers and coders he addressed at a packed Shoreline Amphitheater in Mountain View, California, just down the road from Google’s headquarters, assuring them that the company is still innovating at breakneck speed by putting AI at the forefront of nearly every service and product it ships.

“We see so much opportunity for creators or developers or startups or everyone helping to (advance) those opportunities – is what Gemini is all about,” said Pichai.

Project Astra is Google’s first AI assistant

Project Astra is focused on creating an universal AI agent (Image credit: Anuj Bhatia/The Indian Express)

As part of its suite of announcements during its high-profile I/O event, Google announced a brand new personal AI Assistant that could be the successor to the Google Assistant. It’s called “Project Astra” and it’s powered by Gemini AI. Project Astra can see the world around you and answer questions about it. During a pre-recorded demo of a live experience of Project Astra, a person held up an Android smartphone and kept the camera’s live feed open while asking questions. Astra correctly identifies what it’s looking at and provides accurate responses through voice interactions with the user. Project Astra is like Google Assistant but a lot faster and more intuitive. Google DeepMind CEO Demis Hassabis describes “Project Astra” as focused on creating a “universal AI agent helpful in everyday life.” Hassabis calls Astra a “multimodal” AI assistant, meaning it can respond to various inputs, such as text, images, audio, and video, making it work more like a human. The company plans on rolling out parts of Project Astra later this year through the Gemini app. Project Astra is Google’s answer to OpenAI’s new GPT-4o AI model, which can also speak and view the world through the user’s smartphone camera.

Veo can create high-definition realistic videos with a prompt

Google Veo is an video generator that can output in 1080p. (Image Source: Google)

Perhaps the biggest announcement came in the form of Veo, its new generative AI model that rivals OpenAI’s Sora, expanding beyond text and images to offer video-generation AI for the first time. The new model allows a user to type out a desired scene and turns it into a 1080p clip that Google says goes well beyond a minute in different cinematic and visual styles. “Veo has an advanced understanding of natural language and visual semantics and can generate video that closely represents the user’s creative vision — accurately rendering details in longer prompts and capturing tone,” the company said, describing the capabilities of the new video-generation AI model. Veo can understand cinematic terms like “timelapse” or “aerial shots of a landscape” and can create footage of people, animals, and objects moving realistically throughout shots. Google says Veo is already available for select creators as a private preview inside VideoFX, and users can sign up to join the waitlist. The company also promises to bring some elements of what Veo can do to YouTube Shorts and other products in the future.

Also Read | OpenAI unveils GPT-4o, a powerful free-for-all AI model with vision, text, and voice

Video could be the next big thing in generative AI. While companies like Google and OpenAI say these AI tools create more opportunities for people in creative industries, the recent strikes that hit Hollywood opened up a new battle over artificial intelligence and ethics. But industry watchers believe new text-to-video generation tools pose serious misinformation concerns as major political elections are underway in many parts of the world. According to data from Clarity, a deepfake detection firm, 900 percent more deepfakes have been created and published this year compared to last year.

Google also said it is launching Imagine 3, its new text-to-image AI model that produces photorealistic, lifelike images and can be used for tasks such as creating personalized birthday messages or adding visual elements to presentations.

Google search gets helpful with AI

Google Search AI Overview | Google SGE | Google IO 2024

Google SGE is powered by Gemini. (Image Credit: Anuj Bhatia/Indian Express)

Google has also begun integrating artificial intelligence more into the core search experience. The “Search Generative Experience” – as Google dubs the feature – has been available for nearly a year in the US, but only to users who signed up via Google Labs. It has also been experimenting with “AI overviews”, a set of queries where it thinks generative AI can be especially helpful in getting information from a range of web pages. Since last year, Google opened up a Search Labs section for searchers to opt in to see and use the Google SGE results. Google is now opening AI Overviews to everyone in the US, with more countries to follow soon. The company says it continues to improve AI Overviews, and its new customised Gemini model specifically tailored for Google Search will add multi-step reasoning capabilities. While Google says AI Overviews were already good with complex queries, it is now going to be more helpful with complex questions. For users, instead of breaking your question into multiple searches, they can ask their most complex questions with just one search. For example, “find the best yoga or pilates studios in Boston and show details on their intro offers and walking time from Beacon Hill.”

Story continues below this ad

Google is also beefing up the search’s ability to better handle queries related to planning. For example, if you ask Google to “create a 3-day meal plan for a group that’s easy to prepare,” the search results will show a wide range of recipes from across the web. You can also customize your meal plan or even export your meal plan to Docs and Gmail. Like meals, Google also lets you plan for trips. Both meal and trip planning are in Search Labs in English in the US. Google says it plans to add customization capabilities to planning capabilities in search, and users could add parties, date nights, and workouts.

Gemini Advanced gets more features

Gemini 1.5 Pro is now available for Advanced users (Google/Express Photo)

Google used its developer conference to add a myriad of new features and capabilities to Gemini, its version of the AI chatbot ChatGPT that can answer questions in text form and can also generate pictures in response to text prompts. As part of the I/O announcements, Google said it is bringing Gemini 1.5 Pro to the ultimate version of Google’s AI: Gemini Advanced, for which users need to pay $20 per month for the privilege — the same price OpenAI charges for its upgraded ChatGPT Plus. The new model’s 1 million-token context window allows users to upload large PDFs, code repositories, and lengthy videos as prompts. Gemini Advanced is also getting a new planning experience feature allowing subscribers to get a customized itinerary using their flight timing, meal preferences, and information about local museums through Gmail, Search, and Maps just by a simple prompt. And there’s more. Google will also allow Gemini Advanced subscribers to customise the AI chatbot by creating Gems, such as a writing guide, a gym buddy, coding partner, or a chef. Just describe the Gem and it will respond, keeping your specific needs in mind. For example, you can tell the Gem to come up with a daily running plan that charges you up in the morning and keeps you motivated.

Google has announced that it is adding new AI capabilities right inside the Gmail mobile app, as well as bringing the power of Gemini to its Workspace apps, including Gmail, Drive, Slides, Docs, and more, with an AI-powered sidebar.

Faster AI models with longer context

Google is introducing a new Gemini model called 1.5 Flash, which it says is optimised for high-volume, high-frequency tasks at scale and is more cost-efficient in nature. It also supports large context windows, which allow a model to process and understand extremely long documents, books, scripts, or codebases that would otherwise need to be processed separately. Meanwhile, Gemini Nano, a smaller version of Google’s Gemini AI model for “on-device AI” currently equipped with some Android phones, will expand beyond text inputs to include sight, sound, and spoken language.

Story continues below this ad

Android 15 is official, but barely got mentioned

In the two-hour presentation that was almost entirely centered on AI, Google spent the least amount of time on Android 15 on stage. The newest version of Android is in beta now and is expected in the fall. But the company’s focus was on three breakthroughs coming to Android this year: better searching on your Android, Gemini becoming your AI assistant, and on-device AI unlocking new experiences. The non-presence of Android on stage shows Google’s priorities have changed, even though the dominant mobile operating system still has a giant role to play in bringing Google’s services and offerings to billions of people every day.

Pixel 8a first look: It has the AI magic and an aloe green colour to beat the summer

AI focus is an answer to rising pressure from competition

Throughout the keynote presentation led by Pichai and his team at its developer conference, Google tried to establish how bullish the company was on the prospect of its more advanced artificial intelligence tools now embedded in products. The strategy adopted by the internet giant is to make investors and developers more confident about Google’s play in the digital universe not just in the past but in the future as well. However, the rise of OpenAI, the developer behind ChatGPT, backed by Microsoft, is coming to eat into Google’s dominance. The Sam Altman-run company recently debuted a new model, GPT-4o, which it claims is “much faster,” with improved capabilities in text, video, and audio. OpenAI also said it eventually plans to allow users to video chat with ChatGPT. OpenAI chose to announce the new GPT-4o model ahead of Google’s developer conference, making trade pundits more confident about the tech startup’s ability to compete with a behemoth like Google in the AI arms race and create more such hit products similar to ChatGPT. In fact, many believe OpenAI is gearing up to add a search function in ChatGPT, which would make the rival AI chatbot in direct competition with Google Search.

But Google won’t easily let OpenAI, Microsoft, or even Apple, which is close to revealing its AI strategy next month at its developer conference, win the high-stakes Big Tech’s AI race. At its I/O conference, Google showed how generative AI is ready to make meaningful improvements for users in areas that matter to them, from search to Gmail to videos, spanning every sector from creatives to business. It will try to highlight its research emphasis on technologies that may not be ready yet but could have a huge impact with the right use cases.

“Users and developers trust Google, not only as a source of information but also a source of truth, especially compared with LLM apps like ChatGPT,” said Nikhil Lai, senior analyst, Forrester. “Google’s search business is not vulnerable to Microsoft’s growing stature in AI because Google mitigates counterparty risk between publishers and consumers more effectively than any other company, “ he added. “Generative AI will revolutionise how people synthesize information, but Google’s ability to authenticate sources of information will sustain its search engine’s utility and dominance.”

Story continues below this ad

The writer is in San Francisco covering I/O at the invitation of Google India.

Anuj Bhatia

Anuj Bhatia is a seasoned personal technology writer at indianexpress.com with a career spanning over a decade. Active in the domain since 2011, he has established himself as a distinct voice in tech journalism, specializing in long-form narratives that bridge the gap between complex innovation and consumer lifestyle. Experience & Career: Anuj has been a key contributor to The Indian Express since late 2016. Prior to his current tenure, he served as a Senior Tech Writer at My Mobile magazine and held a role as a reviewer and tech writer at Gizbot. His professional trajectory reflects a rigorous commitment to technology reporting, backed by a postgraduate degree from Banaras Hindu University. Expertise & Focus Areas: Anuj’s reporting covers the spectrum of personal technology, characterized by a unique blend of modern analysis and historical context. His key focus areas include: Core Technology: Comprehensive coverage of smartphones, personal computers, apps, and lifestyle tech. Deep-Dive Narratives: Specializes in composing longer-form feature articles and explainers that explore the intersection of history, technology, and popular culture. Global & Local Scope: Reports extensively on major international product launches from industry titans like Apple and Google, while simultaneously covering the ecosystem of indie and home-grown tech startups. Niche Interests: A dedicated focus on vintage technology and retro gaming, offering readers a nostalgic yet analytical perspective on the evolution of tech. Authoritativeness & Trust Anuj is a trusted voice in the industry, recognized for his ability to de-jargonize trending topics and provide context to rapid technological advancements. His authority is reinforced by his on-ground presence at major international tech conferences and his nuanced approach to product reviews. By balancing coverage of the world's most valuable tech brands with emerging startups, he offers a holistic and objective view of the global technology landscape. Find all stories by Anuj Bhatia here. You can find Anuj on Linkedin. ... Read More

Tags:
Google