
$235.34K
1
5

$235.34K
1
5
Trader mode: Actionable analysis for identifying opportunities and edge
This market will resolve to "Yes" if the Humanity’s Last Exam leaderboard lists any Google Gemini 3 model with a score of at least the specified score by March 31, 2026, 11:59 PM ET. Otherwise, this market will resolve to "No". The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Prediction markets estimate a near-certain probability that a Google Gemini AI model will score at least 40% on a benchmark called "Humanity's Last Exam" by the end of June. The current price translates to a roughly 99% chance, meaning traders collectively see this outcome as almost guaranteed to happen.
The high confidence stems from two main factors. First, the benchmark itself, Humanity's Last Exam, is designed to test an AI's ability to reason across many fields like science, law, and creative writing. A 40% score is a relatively low bar for a top-tier model. Second, Google's Gemini models are already publicly available and competing directly with other advanced AI from companies like OpenAI and Anthropic. The market is essentially betting that Google will run one of its existing or slightly updated models through this public benchmark, which is a standard industry practice, and easily clear the 40% threshold. It would be a major surprise if they did not.
The resolution deadline is June 30, 2026. The main event to watch is the official leaderboard at scale.com. Any time before the deadline, Google or a third party could submit a Gemini model's results. A sudden drop in market probability might occur if June approaches with no submission, but that is currently seen as very unlikely. The steady 99% odds suggest traders expect this to be confirmed well before the final date.
For straightforward, technical outcomes like this, prediction markets have a solid track record. The question has a clear, objective resolution source (the public leaderboard), which reduces ambiguity. Markets are generally good at aggregating information on whether a company will perform a specific, expected action. The main limitation here is time; over a 30-month horizon, unforeseen events could theoretically disrupt Google's plans, but the market is pricing that risk as minimal.
Prediction markets assign a 99% probability that a Google Gemini model will score at least 40% on the Humanity's Last Exam benchmark by March 31, 2026. This price indicates near-certainty. With $231,000 in total volume across related markets, traders are heavily betting on this outcome. The market is effectively pricing this threshold as a baseline expectation, not a significant milestone.
The extreme confidence stems from rapid AI progress and the specific, relatively low performance bar. A 40% score is a modest target for a frontier model expected within a two-year timeframe. Google's Gemini 2.0 models already demonstrate strong reasoning capabilities, and the anticipated Gemini 3.0 iteration is projected to make substantial gains. The benchmark itself, Humanity's Last Exam, is designed to test advanced reasoning and knowledge integration. Market logic suggests that even incremental yearly improvements from current AI systems will easily surpass a 40% threshold. Traders are betting on the continuity of scaling laws and research momentum, not a breakthrough.
The 99% price leaves little room for movement, but a major disruption could theoretically shift it. A significant, prolonged slowdown in Google's AI development pipeline or an unexpected redefinition of the benchmark's scoring methodology before the deadline could introduce doubt. The more meaningful speculation is happening in markets for higher score thresholds (e.g., 60% or 80%), where prices are lower and more volatile. Those markets better capture the real uncertainty about the pace of capability gains. For this 40% market, only a catastrophic failure in Google's AI division before 2026 would likely alter the consensus.
AI-generated analysis based on market data. Not financial advice.
This prediction market focuses on whether Google's Gemini artificial intelligence models will achieve a specified performance score on the 'Humanity's Last Exam' benchmark by March 31, 2026. The benchmark, created by Scale AI, is designed to test AI systems on a comprehensive set of tasks intended to measure general intelligence. The resolution depends on the official leaderboard at scale.com/leaderboard/humanitys_last_exam listing any Google Gemini 3 model meeting or exceeding the target score. The market specifically tracks the Gemini 3 generation, which represents Google's next major iteration of its multimodal AI models following Gemini 1.5 and the anticipated Gemini 2.0. Interest in this market stems from the competitive race in AI development between major technology companies, where benchmark performance is a key public metric of progress. Google's Gemini family directly competes with OpenAI's GPT models, Anthropic's Claude, and Meta's Llama series. The 'Humanity's Last Exam' benchmark has gained attention as a potentially more holistic assessment than narrower tests, covering reasoning, coding, mathematics, and real-world knowledge. Observers use these leaderboards to gauge which organizations are leading in AI capabilities, which influences investment decisions, partnership formations, and public perception of technological leadership. The March 2026 deadline allows time for multiple development cycles and model releases from Google's DeepMind and Google Research teams.
The competition to demonstrate AI superiority through public benchmarks began accelerating in 2018 with the introduction of the GLUE and SuperGLUE benchmarks for natural language understanding. In June 2020, OpenAI's GPT-3 demonstrated unprecedented few-shot learning capabilities, setting a new standard for large language models. Google responded by introducing the Pathways architecture vision in 2021, which evolved into the PaLM model family, with PaLM 2 launching in May 2023. The current benchmark era intensified with the November 2022 release of ChatGPT, which popularized AI capabilities beyond research circles. Google announced the Gemini model family in December 2023, positioning it as a multimodal competitor to GPT-4. The first Gemini 1.0 models launched in December 2023, with the Gemini 1.5 Pro update following in February 2024, featuring a 1 million token context window. Scale AI introduced the Humanity's Last Exam benchmark in 2024 as a response to perceived limitations in existing benchmarks like MMLU and HellaSwag, which test narrow capabilities. The benchmark's name reflects its ambition to test comprehensive intelligence, though critics note all benchmarks have limitations. Historically, Google has consistently improved its model performance across evaluation cycles, with Gemini 1.5 Ultra outperforming GPT-4 on some metrics according to Google's February 2024 technical report.
Benchmark performance directly influences the competitive landscape of the AI industry, which is projected to add trillions to the global economy. Companies that lead in public benchmarks often attract top research talent, secure enterprise contracts, and see increased valuation. For Google, strong performance on comprehensive benchmarks like Humanity's Last Exam could help regain perceived leadership after OpenAI's ChatGPT captured public attention. The results affect investor confidence in Alphabet's stock, as AI capability is now a core valuation metric for major technology firms. Beyond corporate competition, benchmark achievements shape regulatory discussions about AI safety and capability. Policymakers in the EU, US, and China monitor these metrics when considering AI regulation thresholds. Performance claims also influence adoption decisions across industries from healthcare to finance, where organizations select AI providers based partly on published benchmark results. If AI systems approach human-level performance on broad evaluations, it could accelerate automation in knowledge work sectors, affecting employment patterns and requiring new economic policies.
As of early 2024, Google's Gemini 1.5 Pro is the company's most capable publicly available model. The Gemini 2.0 model is under development, with Gemini 3 expected to follow. Scale AI's Humanity's Last Exam leaderboard shows various models from different organizations, but no Gemini 3 entries exist yet as the model has not been released. Google has not announced specific timeline details for Gemini 3, though typical development cycles suggest a 2025-2026 timeframe. The benchmark itself continues to evolve, with Scale AI periodically updating tasks and evaluation methods. Recent AI safety evaluations following the UK AI Safety Summit in November 2023 have increased scrutiny on comprehensive testing, potentially affecting how benchmarks like Humanity's Last Exam are weighted by observers.
Humanity's Last Exam is a comprehensive AI evaluation created by Scale AI that tests models across multiple domains including reasoning, mathematics, coding, and general knowledge. It aims to provide a more holistic assessment of AI capabilities than single-domain benchmarks.
Google has not announced an official release date for Gemini 3. Based on previous development cycles, with Gemini 1.0 launching in December 2023 and Gemini 1.5 in February 2024, industry observers speculate Gemini 3 could arrive in 2025 or 2026.
According to Google's February 2024 technical report, Gemini 1.5 Pro performs comparably to GPT-4 on many benchmarks, with advantages in long-context processing but slightly lower scores on some reasoning tasks. Direct comparisons are complicated by different evaluation methodologies.
The market specifies a minimum score that will be determined when the market is created. The resolution depends on whether any Gemini 3 model reaches or exceeds that specific threshold on the official Humanity's Last Exam leaderboard by March 31, 2026.
Scale AI, the company that created the benchmark, maintains and verifies the leaderboard. They use standardized evaluation protocols and typically require model developers to submit their models for independent testing under controlled conditions.
Educational content is AI-generated and sourced from Wikipedia. It should not be considered financial advice.
5 markets tracked

No data available
| Market | Platform | Price |
|---|---|---|
![]() | Poly | 99% |
![]() | Poly | 81% |
![]() | Poly | 53% |
![]() | Poly | 27% |
![]() | Poly | 17% |





No related news found
Add this market to your website
<iframe src="https://predictpedia.com/embed/LCemyd" width="400" height="160" frameborder="0" style="border-radius: 8px; max-width: 100%;" title="Google Gemini score on Humanity’s Last Exam by June 30?"></iframe>