Back to News
Market Impact: 0.15

Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI

GOOGLGOOG
Artificial IntelligenceTechnology & Innovation
Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI

Researchers from the Center for AI Safety and Scale AI published Humanity’s Last Exam, a PhD-level benchmark of 2,500 non-searchable questions across 100+ subjects intended to measure expert-level AI reasoning. As of Feb. 12, 2026 Google’s Gemini 3 Deep Think leads with a 48.4% score (humans score ~90%), while earlier top performers included OpenAI’s o1 at 8.3%; the study stresses that high HLE accuracy would indicate expert-level closed-ended performance but not autonomous research capabilities or AGI. The test’s strict vetting and non-memorization design make it a new standard for assessing large models’ progress, but authors warn results aren’t definitive evidence of general intelligence.

Analysis

Market structure: The HLE score jump (Gemini 48.4% vs prior single-digit results) reinforces scale advantages for hyperscalers and vertically integrated chip/software vendors (GOOGL, NVDA, AMZN, MSFT). Expect 6–18 month higher infrastructure spend: leading model training demand could lift GPU/cloud billings by ~20–50% YoY for early adopters, compressing margins for smaller AI service providers and legacy software vendors. Risk assessment: Key tail risks are regulatory clampdowns or liability events that could remove 5–20% of addressable monetization near-term, and a sudden capex surge that inflates costs (GPU spot shortages). Immediate (days) market moves will be muted; short-term (weeks–months) will see M&A/speculation; long-term (1–3 years) winners with model+stack control can compound revenue +10–30% CAGR but require sustained talent and dataset access. Trade implications: Favor equities tied to compute + software stack; expect NVDA and GOOGL to exhibit asymmetric upside but also event-driven volatility. Use relative-value and option structures to capture fast model-improvement headlines (buy LEAPs/call spreads; hedge with index tail puts) and rotate out of small-cap AI consultancies that lack IP or proprietary models. Contrarian angles: Consensus equates HLE progress with near-term monetization — that’s likely overstated: monetization lag of 6–18 months and rising price competition can compress margins. The market may underprice regulatory/legal friction and overprice immediate cash flows from improved benchmarks; historical parallel: deep-learning breakthroughs (2012) created a multi-year maturation, not instant profit realization.

AllMind AI Terminal

AI-powered research, real-time alerts, and portfolio analytics for institutional investors.

Request a Demo

Market Sentiment

Overall Sentiment

neutral

Sentiment Score

0.05

Ticker Sentiment

GOOG0.12
GOOGL0.15

Key Decisions for Investors

  • Establish a 2–3% portfolio long position in GOOGL (ticker GOOGL) via stock or 12-month LEAP calls ~10–15% OTM to capture search/ads + enterprise AI monetization; add on pullbacks >5%, target +30–50% in 9–18 months, stop -15%.
  • Establish a 3–4% long in NVDA (ticker NVDA) via stock or 6-month 20% OTM call spreads (size 1–2% notional) to play continued GPU scarcity/demand; enter within 2–6 weeks or on ≤5% pullback, take profits at +40% or reassess if revenue guidance misses by >10%.
  • Implement a pair trade: long GOOGL 2% / short AMD 1.5% (ticker AMD) to express model-stack leadership vs faster-cycling chip cyclicality; expected relative outperformance over 3–6 months, close if AMD outperforms GOOGL by >10% in 30 days.
  • Reduce exposure to small-cap AI services/consulting names by 50% vs benchmark within 30 days and redeploy proceeds into semiconductors (NVDA) and cloud leaders (GOOGL, AMZN) — these have stronger IP, data and scale to monetize HLE-like advances.