New York Times sues Perplexity AI for 'illegal' copying of content

The New York Times sued Perplexity AI in the U.S. District Court for the Southern District of New York, alleging the startup copied, distributed and displayed millions of NYT articles (including paywalled content) without permission to power its generative-AI products, produced fabricated 'hallucinations' falsely attributed to the paper, and is seeking damages, injunctive relief and other equitable remedies. Perplexity, a San Francisco-based startup valued at about $20 billion, denies unlawful scraping—saying it indexes web pages and provides citations—and characterizes the suits as an unsuccessful tactic by publishers; the NYT had sent a cease-and-desist more than a year ago. The case, alongside similar litigation from Reddit, Chicago Tribune, Britannica, Dow Jones and the New York Post, raises enforcement and business-risk for generative-AI firms and could lead to injunctions or material damages affecting Perplexity and investor sentiment; NYT shares rose ~1.8% on the news.

Analysis

Market structure: The NYT litigation signals a shift from free web-scraped training data toward paid licensing; winners are legacy publishers (NYT) and large cloud/licensing incumbents (AMZN/AWS) who can transact; losers are challenger LLM-first startups (Perplexity and similar private players) facing higher content acquisition costs and legal risk. Supply of high-quality, verifiable journalism will become scarcer/price-inelastic for model builders, raising marginal cost per useful token and favoring deep-pocketed players; equity volatility in smaller AI names should rise 20–50% relative to mega-cap tech. Risk assessment: Tail risks include injunctions requiring models to remove specific content or statutory rulings expanding copyright remedies — a low-probability but value-destroying outcome for exposed startups over 6–24 months. Near-term (days–weeks) headline risk dominates; mid-term (3–12 months) legal discovery and settlements set licensing precedents; long-term (1–3 years) could institutionalize data-licensing markets and compress returns for scraping-reliant entrants. Hidden dependency: many models rely on circumvention of robots.txt and publisher meta-tags; catalysts include federal legislation, high-profile settlements (NYT/Reddit), or precedent-setting injunctions. Trade implications: Tactical trades should favor selective long exposure to publishers and cloud providers and hedges against small-cap AI players. Consider modest long NYT (2–4%) and AMZN (1–3%) exposure via cash or call spreads with 6–12 month expiries, and protective put exposure on RDDT-sized or similarly exposed equities for 3 months to insure against adverse rulings. Sector rotation: reduce allocation to speculative AI apps and increase to media, cloud infra, and enterprise AI customers where contractual data use is clearer; re-evaluate positions at each major docket milestone. Contrarian angle: The market underappreciates publishers' leverage — legal action can convert ad/subscription churn into recurring licensing revenue (Napster→music-licensing analogue) and raise barriers to entry; the consensus that litigation is merely a “tactic” is likely underdone. Overreaction risk: pricing a permanent ban on web data is extreme; more probable outcome is negotiated licensing and attribution rules that create predictable revenue streams, benefiting NYT and AWS while penalizing scraping-first startups.

AllMind

AllMind

New York Times sues Perplexity AI for 'illegal' copying of content

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors