
$78.99K
1
3

$78.99K
1
3
Trader mode: Actionable analysis for identifying opportunities and edge
This market will resolve to "Yes" if the Humanity’s Last Exam leaderboard lists any OpenAI GPT model with a score of at least the specified score by June 30, 2026, 11:59 PM ET. Otherwise, this market will resolve to "No". The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Prediction markets currently give an OpenAI GPT model an 88% chance of scoring at least 35% on a test called "Humanity's Last Exam" by the end of June. In simpler terms, traders see it as very likely, roughly a 9 in 10 chance, that an advanced AI will achieve this benchmark soon. This represents high confidence that a notable performance milestone will be reached.
The high confidence stems from a few key factors. First, "Humanity's Last Exam" is a public benchmark run by AI company Scale AI. It is designed to be an extremely difficult test of reasoning and knowledge, meant to gauge when an AI might match broad human-level understanding. A 35% score, while far from a perfect grade, is seen as a significant technical hurdle.
Second, OpenAI has a consistent record of rapidly improving its models' performance on complex benchmarks with each new release. The market is essentially betting that this trend will continue over the next four months.
Finally, the specific threshold of 35% may be viewed by traders as an achievable near-term target. It is a challenging but plausible step up from current public capabilities, making it a logical focus for the next generation of models like a potential GPT-5.
The definitive deadline is June 30, 2026. The prediction will be settled based on the official leaderboard snapshot at that time.
The main event to watch for is an official announcement or release from OpenAI of a new flagship model, such as GPT-5 or a comparable successor. A release in the spring or early summer would provide the opportunity for the model to be tested and its score published on the leaderboard before the deadline. Silence from OpenAI or a delay in a major release would be the primary factors that could lower the current high probability.
Prediction markets are generally effective at aggregating technical community sentiment on specific, near-term tech milestones. For events like this with a clear yes/no outcome and a public verification source, their track record is decent. However, the reliability here depends heavily on insider knowledge of AI lab timelines, which is imperfect. The 88% probability reflects strong belief, but it is not a guarantee. It could shift quickly based on leaks or official communications about development schedules.
Prediction markets assign an 88% probability that an OpenAI GPT model will achieve a score of at least 35% on the Humanity's Last Exam benchmark by June 30, 2026. This price indicates overwhelming confidence in a positive outcome. With $79,000 in total volume across three related markets, liquidity is thin, suggesting this is a niche but strongly held view among specialized traders.
The high probability reflects a specific belief in rapid AI capability scaling. The Humanity's Last Exam, hosted by Scale AI, is a notoriously difficult benchmark designed to test advanced reasoning, knowledge, and coding skills. A 35% score, while low in absolute terms, would represent a significant leap for current frontier models. The market's pricing suggests traders expect the next 2-3 generations of OpenAI's GPT models to make discontinuous progress on complex, multi-domain tasks. This optimism is likely anchored in the performance jumps observed between GPT-3, GPT-4, and GPT-4o, coupled with OpenAI's consistent pipeline of model releases.
The primary risk to the current consensus is a plateau in AI capabilities. If OpenAI's next major model, such as a hypothetical GPT-5, fails to show marked improvement on reasoning-heavy benchmarks, these odds would fall sharply. The resolution date is over two years away, leaving ample time for technical hurdles or a shift in OpenAI's research focus to emerge. Conversely, a surprise early release of a model that scores highly on similar benchmarks could push probabilities even higher, potentially toward 95%, well before the 2026 deadline. Traders will closely watch any interim leaderboard updates or official research previews from OpenAI.
AI-generated analysis based on market data. Not financial advice.
This prediction market topic concerns whether any OpenAI GPT model will achieve a specified score on the 'Humanity's Last Exam' benchmark by June 30, 2026. The benchmark, hosted by the AI company Scale AI, is designed to test artificial intelligence systems on a comprehensive set of tasks intended to measure general reasoning and problem-solving abilities approaching or exceeding human-level performance. The resolution depends on the official leaderboard maintained by Scale AI, making it a verifiable, objective metric for tracking progress in AI capability. The specific score threshold is not publicly defined in the topic description, indicating that the market creator has set a particular performance target that participants must forecast. Interest in this market stems from its function as a proxy for evaluating the rapid advancement of large language models, particularly those developed by OpenAI, which are often at the forefront of public and industry attention. The June 2026 deadline provides a medium-term timeline for assessing whether current trends in model scaling and architectural improvements will translate into measurable gains on a demanding, integrated evaluation. The outcome may influence perceptions of AI timelines and the commercial and strategic positioning of leading AI labs.
The pursuit of benchmarks to measure AI progress has a long history, dating back to tests like the Turing Test proposed in 1950. In the modern era, the rise of large language models has led to a proliferation of standardized evaluations. Benchmarks such as the Massive Multitask Language Understanding (MMLU) exam, introduced in 2020, became a standard for measuring broad knowledge. OpenAI's GPT-4 achieved a score of approximately 86.4% on MMLU in 2023, a significant leap from previous models. The creation of more difficult, multidisciplinary benchmarks followed, including GPQA (Graduate-Level Google-Proof Q&A) and the ARC-AGI challenge, reflecting the community's search for exams that current models cannot easily pass. Scale AI entered this space with its Public Evals platform, launching benchmarks for coding (SWE-bench) and other domains. Humanity's Last Exam, announced in late 2024, represents a newer generation of benchmark that attempts to consolidate many reasoning domains into a single, challenging test. Its name and marketing suggest it is positioned as a potential milestone on the path to artificial general intelligence (AGI), making performance on it a closely watched metric.
The result of this prediction market offers a quantified, crowd-sourced forecast on a specific milestone in AI capability. A 'Yes' resolution would signal broad confidence that OpenAI's models will reach a notable threshold of general reasoning by mid-2026, which could influence investment, regulatory discussions, and public perception of AI advancement. It provides a tangible, time-bound event around which to organize expectations, moving beyond vague predictions. For researchers and companies, the benchmark score itself, regardless of the market outcome, serves as a valuable datapoint for comparing model architectures and training approaches. It helps identify which techniques yield gains on integrated, cross-domain reasoning tasks. The market's focus on a single lab's model also highlights the competitive dynamics in AI. Success or failure could affect perceptions of OpenAI's technical lead relative to competitors like Anthropic, Google DeepMind, or Meta, potentially impacting talent recruitment, partnership opportunities, and customer adoption.
As of early 2025, the Humanity's Last Exam leaderboard is active and accepting submissions. The initial baseline scores for various open-source and proprietary models have been published, establishing a performance range. No OpenAI GPT model (including GPT-4 and GPT-4 Turbo) has yet achieved the unspecified target score referenced in this prediction market topic. The leaderboard shows incremental improvements from different research teams, but a significant gap remains between current results and what is implied by the benchmark's aspirational name. OpenAI has not publicly announced a specific model intended to target this benchmark, though its general research roadmap suggests ongoing work on more capable systems.
Humanity's Last Exam is a comprehensive AI benchmarking suite created by Scale AI. It tests models on a wide array of reasoning, knowledge, and problem-solving tasks across multiple disciplines, designed to be exceptionally challenging for current AI systems. Its goal is to measure progress toward more general, human-like intelligence.
The specific score threshold is set by the creator of this individual prediction market and is not published in the general topic description. Participants must forecast whether any OpenAI model will meet or exceed that undisclosed number on the official leaderboard by the deadline.
Scale AI calculates the score based on model performance across hundreds of curated tasks within the benchmark. The exact weighting and composition of tasks are detailed in the benchmark's technical documentation. The final score is typically presented as a percentage or normalized value on the public leaderboard.
Yes, versions of GPT-4 have almost certainly been evaluated, as the leaderboard includes results for major proprietary models. The exact scores for GPT-4 variants are listed on the Scale AI leaderboard, and they currently fall below the market's target threshold.
The market only considers the leaderboard state as of 11:59 PM Eastern Time on June 30, 2026. Any model released or scored after that deadline does not count for resolution. The leaderboard snapshot at that exact time is definitive.
Educational content is AI-generated and sourced from Wikipedia. It should not be considered financial advice.
3 markets tracked

No data available
| Market | Platform | Price |
|---|---|---|
![]() | Poly | 88% |
![]() | Poly | 82% |
![]() | Poly | 55% |



No related news found
Add this market to your website
<iframe src="https://predictpedia.com/embed/mlk4wr" width="400" height="160" frameborder="0" style="border-radius: 8px; max-width: 100%;" title="OpenAI GPT score on Humanity’s Last Exam by June 30?"></iframe>