It has also instructed contractors to delete proprietary and personally identifiable information before uploading the training data. (Source: Freepik)
OpenAI is looking to improve the real-world usefulness of its next-generation AI models by training them on data drawn from everyday tasks.
The ChatGPT maker has partnered with training data company Handshake AI to collect data from third-party contractors based on the real work they did in their previous and current job roles, according to a report by Wired.
The data collection is part of OpenAI’s efforts to compare the performance of its AI models against an established human baseline for various tasks. It comes at a time when several AI companies including Anthropic and Google are enlisting large teams of contractors to generate high-quality training data that can be used to develop AI models and AI agents capable of automating enterprise work.
Several tech industry leaders have warned of a white-collar ‘bloodbath’ due to the impact of AI on low-level tasks and entry-level roles, even as tech companies such as OpenAI continue to pursue artificial general intelligence (AGI) – a hypothetical AI system that outperforms humans at most economically valuable tasks.
OpenAI has directed contractors to upload data on real-world tasks with two components: the request from a person’s manager or colleague asking them to do a task (task request) and the work produced in response to that request (task deliverable).
In an internal presentation, OpenAI reportedly asked contractors to upload examples of real, on-the-job work that they have completed in the past or present, such as “a concrete output (not a summary of the file, but the actual file), e.g., Word doc, PDF, Powerpoint, Excel, image, repo.”
The Microsoft-backed AI startup has also instructed contractors to delete proprietary and personally identifiable information before uploading the training data using a specialised ‘ChatGPT Superstar Scrubbing’ tool.
“We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks. Take existing pieces of long-term or complex work (hours or days+) that you’ve done in your occupation and turn each into a task,” OpenAI was quoted as saying in an internal document seen by Wired.
“Remove or anonymise any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategy, unreleased product details),” it added.
The generative AI boom has created a lucrative sub-industry comprising third-party contracting firms such as Handshake AI, Surge, Mercor, and Scale AI that hire and manage networks of data contractors to generate higher-quality training data in order to improve AI models.