Premium
This is an archive article published on May 22, 2024

What are AI agents, which power OpenAI’s GPT4o and Google’s Project Astra?

Known as ‘AI agents’, GPT-4o and Project Astra have been touted as far superior to conventional voice assistants such as Alexa, Siri, and Google Assistant. The launch of these models marks a new phase in AI — the transition from chatbots to multimodal interactive AI agents.

AI agentsAI agents perceive their environment via sensors, then process the information using algorithms or AI models, and subsequently, take actions. (Representational image/ FreePik)

The recently launched GPT-4o by OpenAI and Project Astra by Google have one thing in common: both are capable of processing the real world through audio and visual inputs and provide intelligent responses and assistance. In other words, the new AI models can have instant real-time conversations with a user.

Known as ‘AI agents’, GPT-4o and Project Astra have been touted as far superior to conventional voice assistants such as Alexa, Siri, and Google Assistant. The launch of these models marks a new phase in AI — the transition from chatbots to multimodal interactive AI agents.

What are AI agents?

AI agents are sophisticated AI systems that can engage in real-time, multi-modal (text, image, or voice) interactions with humans. Unlike conventional language models, which solely work on text-based inputs and outputs, AI agents can process and respond to a wide variety of inputs including voice, images, and even input from their surroundings.

“You’re not typing into a text box, waiting for a response and then reading the output. You’re actually interacting with the AI through voice just as you would a human,” according to Google CEO Sundar Pichai.

From the demonstration by OpenAI and Google, one can say that AI agents are nimble when it comes to adapting to new situations. This facet makes them incredibly versatile and capable of handling a wide range of situations.

AI agents perceive their environment via sensors, then process the information using algorithms or AI models, and subsequently, take actions. Currently, they are used in fields such as gaming, robotics, virtual assistants, autonomous vehicles, etc.

How are they different from large language models?

While large language models (LLMs) like GPT-3 and GPT-4 have the ability to only generate human-like text, AI agents make interactions more natural and immersive with the help of voice, vision, and environmental sensors. Unlike LLMs, AI agents are designed for instantaneous, real-time conversations with responses much similar to humans.

Story continues below this ad

LLMs lack contextual awareness, while AI agents can understand and learn from the context of interactions, allowing them to provide more relevant and personalised responses. Also, language models do not have any autonomy since they only generate text output. AI agents, however, can perform complex tasks autonomously such as coding, data analysis, etc. When integrated with robotic systems, AI agents can even perform physical actions.

What are the potential uses of AI agents?

AI agents can serve as intelligent and highly capable assistants. They are capable of handling an array of tasks, from offering personalised recommendations to scheduling appointments. Reports suggest that AI agents can be ideal for customer service as they can offer seamless natural interactions, and resolve queries instantly without actually the need for human interventions.

In the field of education and training, AI agents can act as personal tutors, customise themselves based on a student’s learning styles, and may even offer a tailored set of instructions. In healthcare, they could assist medical professionals by providing real-time analysis, diagnostic support, and even monitoring patients.

Are there any risks and challenges?

While AI agents showcase immense potential for the future, they are not without risks. Privacy and security are a key area of concern as AI agents gain access to more personal data and environmental information. Just like any AI model, AI agents can carry forward biases from their training data or algorithms, leading to harmful outcomes. As these systems become more common, appropriate regulations and governance frameworks should be laid out to ensure their responsible deployment.

Bijin Jose, an Assistant Editor at Indian Express Online in New Delhi, is a technology journalist with a portfolio spanning various prestigious publications. Starting as a citizen journalist with The Times of India in 2013, he transitioned through roles at India Today Digital and The Economic Times, before finding his niche at The Indian Express. With a BA in English from Maharaja Sayajirao University, Vadodara, and an MA in English Literature, Bijin's expertise extends from crime reporting to cultural features. With a keen interest in closely covering developments in artificial intelligence, Bijin provides nuanced perspectives on its implications for society and beyond. ... Read More

 

Latest Comment
Post Comment
Read Comments
Advertisement
Advertisement
Advertisement
Advertisement