AI’s ability to see ‘mirages’ shows how alien machine brains really are

Anthropic accidentally exposed details about a new model (“Mythos/Capybara”) via an unsecured draft blog post and then leaked the agentic harness code around Claude Code, prompting government briefings and elevated cybersecurity concerns. Stanford researchers found multimodal models display “mirage reasoning”: models can diagnose medical images without seeing them, scoring ~70–80% of image-included benchmark performance and a fine-tuned Qwen-2.5 (trained without images) outperformed human radiologists by ~10%, raising severe doubts about benchmark validity and real-world clinical safety. Expect heightened regulatory and security scrutiny across AI firms and potential reputational/operational impacts for companies exposed by these lapses.

Analysis

Opaque internal model reasoning and weak multimodal grounding create a governance vacuum that will reallocate capital inside tech over 6–24 months. Expect persistent premium to firms that can credibly demonstrate model interpretability, secure model supply chains, and certified evaluation pipelines; absent that, enterprise customers will demand insurance and audit services that add 5–15% to their procurement cost. Second-order winners are audit and security service providers, data-center builders and their upstream suppliers (power, racks, networking) because demand for isolated, auditable deployments rises even as public-cloud experimentation slows; this should boost hyperscaler-capex adjacencies for 12–36 months. Losers are fast-to-market AI software vendors that rely on opaque benchmarking — healthcare and regulated verticals face the greatest adoption drag, potentially shaving 3–7% off projected TAM realizations in the next 1–2 years as procurement cycles lengthen. Key catalysts and tail risks: a high-profile misuse or clinical misdiagnosis litigated in the next 3–12 months could trigger accelerated legislation and procurement freezes, while publication of robust, image-grounded benchmarks could re-rate companies that actually invest in true multimodal validation. Conversely, meaningful productized on-prem solutions that pass third-party audits within 6–12 months could quicken renewals and compress downside for cloud incumbents. Contrarian angle: the market’s reflex to punish large consumer-facing tech for AI governance missteps underappreciates near-term commercial demand for hardened, private deployments; valuation resets that favor pure-play cloud infra and cybersecurity may be overdone if those companies can demonstrate certified stacks quickly.

AllMind

AllMind

AI’s ability to see ‘mirages’ shows how alien machine brains really are

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors