Premium

OpenAI training its next-gen AI models on real-world tasks by contractors: Report

It is part of OpenAI’s efforts to compare the performance of its AI models against an established human baseline for various tasks.

If you spend your time hunched over a computer keyboard, this can lead to pain and stiffness in your neck and shoulders.It has also instructed contractors to delete proprietary and personally identifiable information before uploading the training data. (Source: Freepik)

OpenAI is looking to improve the real-world usefulness of its next-generation AI models by training them on data drawn from everyday tasks.

The ChatGPT maker has partnered with training data company Handshake AI to collect data from third-party contractors based on the real work they did in their previous and current job roles, according to a report by Wired.

The data collection is part of OpenAI’s efforts to compare the performance of its AI models against an established human baseline for various tasks. It comes at a time when several AI companies including Anthropic and Google are enlisting large teams of contractors to generate high-quality training data that can be used to develop AI models and AI agents capable of automating enterprise work.

Several tech industry leaders have warned of a white-collar ‘bloodbath’ due to the impact of AI on low-level tasks and entry-level roles, even as tech companies such as OpenAI continue to pursue artificial general intelligence (AGI) – a hypothetical AI system that outperforms humans at most economically valuable tasks.

What are OpenAI’s contractors tasked with?

OpenAI has directed contractors to upload data on real-world tasks with two components: the request from a person’s manager or colleague asking them to do a task (task request) and the work produced in response to that request (task deliverable).

In an internal presentation, OpenAI reportedly asked contractors to upload examples of real, on-the-job work that they have completed in the past or present, such as “a concrete output (not a summary of the file, but the actual file), e.g., Word doc, PDF, Powerpoint, Excel, image, repo.”

The Microsoft-backed AI startup has also instructed contractors to delete proprietary and personally identifiable information before uploading the training data using a specialised ‘ChatGPT Superstar Scrubbing’ tool.

Story continues below this ad

“We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks. Take existing pieces of long-term or complex work (hours or days+) that you’ve done in your occupation and turn each into a task,” OpenAI was quoted as saying in an internal document seen by Wired.

“Remove or anonymise any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategy, unreleased product details),” it added.

The generative AI boom has created a lucrative sub-industry comprising third-party contracting firms such as Handshake AI, Surge, Mercor, and Scale AI that hire and manage networks of data contractors to generate higher-quality training data in order to improve AI models.

 

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement