Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

OpenAI has released a new Benchmark Thursday it examines how ITS AI models perform compared to human professionals throughout different industries and jobs. The test, GDPwall, OpenAE system is a preliminary attempt to understand how close people are to exceed economically valuable work – a key part of the organization’s founding mission for artificial general intelligence or AGI development.
OpenAI says its GPT -5 model and anthropic Clod Opus 4.1 “already moving towards the quality of work produced by art experts.”
It goes without saying that the Opeeni models will start replacing them immediately to their jobs. In spite of some CEO’s predictions that AI will take people’s jobs in just a few years, OpenAI acknowledged that GDPwal today people do very limited number of things in their real work. However, this is one of the latest ways to measure AI’s progress toward this milestone.
GDPL is based on nine industries that contribute to the country’s gross domestic product, including healthcare, finance, production and domains like the government. Benchmark examines an AI model performance in 44 of these industries from software engineers to journalists.
For the first edition of the Openai Examination, GDPVL-V0, OpenAI asked experienced professionals to compare AI-exposed reports with other professionals and then to choose the best. For example, a prompt investment was asked to create a competitive landscape for the last mile distribution industry of the bankers and compare them with the AI-exposed report. Open is then a “Win Rate” of an AI model against human report across all 44 occupations.
For the GPT -5 -high, a soup up version of GPT -5 with additional calculating power, the company says that the AI ​​model was ranked as better than 40.6% of the time of industrial experts.
Opina also tested the Clod Opus 4.1 model of the Opini anthropic, which was or equivalent to 49% of the art experts. Opina says that it believes that Clock scored so much because it was due to the tendency to create pleasant graphics more than perfect performance.
TechCrunch event
San Francisco
|
October 27-29, 2025

It is important to note that most working professionals do much more than submitting research reports to their boss, which is only for the GDPVal-V0 exam. OpenAI recognizes it and says that it has planned to create a more powerful test in the future that can account for more industrial and interactive workflies.
Nevertheless, the company sees the progress of GDPwal as significant.
In an interview with Techcunch, OpenAI chief economist Dr Aaron Chatterjee said that the GDPwal results indicate that these job people can now use AI models to spend time in more meaningful work.
“[Because] The model is getting better in some of these models, “Chatterjee says,” These job people can now use the model, to offer the growing capabilities to offload some of their work and to do possible higher quality things. “
Tejal Patwarhan, the leadership of the opening, told TechCrunch that he was encouraged by GDPwall. OpenAI’s GPT -4O model scored only 13.7% (winning and relationship people), which was published about 15 months ago. Now the GPT -5 score is almost triple that, a trend is expected to continue patwards.
Silicon Valley has a wide range of criteria that uses to measure the progress of the AI ​​models and determines whether the given model is sophisticated. Among the most popular are AIM 2025 (a test of competitive math problems) and GPQA Diamond (a test of PhD level science questions). However there are several AI models Around saturation Some of these are in criteria, and many AI researchers have quoted the requirements Better test It can measure AI’s skills in real-world functions. Benchmarks like GDPwal can become increasingly important in that conversation, because the opening case creates the AI ​​models valued for a wide art.