OpenAI has introduced GDPval, a new benchmark designed to assess AI models' performance against human professionals across 44 occupations in nine key industries, including finance and healthcare. The initial results show OpenAI's GPT-5-high matching or surpassing human experts in 40.6% of tasks, while Anthropic's Claude Opus 4.1 achieved 49%, representing a significant leap from GPT-4o's 13.7% just 15 months prior. Although GDPval-v0 currently focuses on report generation, these findings highlight the accelerating pace of AI development towards expert-level quality in specific professional tasks, suggesting potential for substantial productivity gains and a shift towards higher-value human work as AI capabilities advance.
OpenAI has introduced a new benchmark, GDPval, to quantify AI model performance against human experts across 44 occupations in key economic sectors. The initial results indicate that foundation models are rapidly approaching expert-level quality in specific tasks, with Anthropic's Claude Opus 4.1 being rated as on par with or better than human professionals in 49% of tested scenarios, and OpenAI's own GPT-5-high achieving a 40.6% rating. This marks a substantial capability leap from OpenAI's GPT-4o, which scored only 13.7% just 15 months prior, underscoring the accelerating pace of AI development. While OpenAI notes the current benchmark is limited to report generation and suggests Claude's higher score may be partly due to superior presentation, the trend points toward a significant potential for productivity augmentation in white-collar jobs. The creation of economically-focused benchmarks like GDPval represents a maturing evaluation landscape for AI, shifting focus from academic tests to real-world, value-generating applications.
AI-powered research, real-time alerts, and portfolio analytics for institutional investors.
Request a DemoOverall Sentiment
Positive
Sentiment Score
0.40
Ticker Sentiment