Premium

Microsoft unveils Fara-7B, a lightweight AI model that can use a PC from a single screenshot

Unlike bulky multi-model stacks that depend on massive cloud compute, Fara-7B can run directly on devices, studying screenshots and performing actions such as clicking, typing and navigation.

Microsoft has introduced Fara-7B, a compact 7-billion-parameter AI model built to operate computers. (Image: Microsoft)Microsoft has introduced Fara-7B, a compact 7-billion-parameter AI model built to operate computers. (Image: Microsoft)

Microsoft has unveiled its first small language model (SLM), Fara-7B, which has been developed to use a computer the way humans do. The new model is an extension of Microsoft’s SLMs that were introduced last year, starting with Phi, which was powered by Windows 11. Microsoft claims that its compact AI model is surprisingly powerful for its size and has dubbed it as the company’s first agentic SLM designed for computer use.

What is Fara-7B?

Fara-7B is essentially a Computer Use Agent (CUA) model, which is radically different from traditional models that generate text-based responses. The model stands out, as it does not rely on a massive cloud setup and works in a way that makes sense for everyday users. So far we know what an AI agent does, such as type, click, and navigate the web. However, with Fara, Microsoft has combined all of this into a compact 7 billion parameter model that can be run on your device.

Most CUA models require huge cloud servers and numerous subsystems, coupled with a vast amount of compute prowess, just to comprehend what’s on the screen. Fara-7B is just one model with no appendages, and it doesn’t even need multiple models working behind the scenes to function.

“With only 7 billion parameters, Fara-7B achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models. Fara-7B’s small size now makes it possible to run CUA models directly on devices. This results in reduced latency and improved privacy, as user data remains local,” Microsoft described the model in its official release.

Reportedly, the biggest takeaway of the model is its simplicity, as it just looks at a screenshot and makes decisions. This makes it affordable and easier to deploy. When it comes to training this model, Microsoft built a massive synthetic data pipeline known as FaraGen, which makes AI agents perform tasks on real websites across 70,000 domains. The system creates realistic multi-step sessions that imitate human behaviour, including retries, mistakes, scrolling, and searching.

Every session is reviewed by three separate AI judges to ensure steps make sense and outputs match what’s visible on the page. After filtering, Microsoft retained 145,630 verified sessions containing over 1 million individual actions to train the model.

On performance

When it comes to performance, Fara 7B uses around 124,000 input tokens and only 1,100 output tokens per task. Microsoft estimates a full task costs around 2.5 cents compared to roughly 30 cents for large-scale agents using GPT-4 or O3 reasoning models.

Story continues below this ad

Performance benchmarks are strong for a lightweight model with 73.5 per cent on Web Voyager, 34.1 per cent on OnlineMind 2 Webb, 26.2 per cent on DeepShop, and WebTailBench at 38.4 per cent. The last benchmark is important, as it focuses on real-world tasks like job applications and real estate searches.

Fara-7B is currently available on Microsoft Foundry and Hugging Face under an MIT license and is integrated with Magentic-UI, a research prototype from Microsoft Research AI Frontiers. Besides, the tech giant is also releasing a quantised, silicon-optimised version for Copilot+ PCs running Windows 11, allowing users to install and test the model locally. The pre-optimised package can be downloaded and run directly in community environments.

The model is open-weight, and with this Microsoft hopes to lower the barrier for developers experimenting with and advancing CUA technology, specifically for automating everyday web tasks.

 

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement