The 1M-Token Context Window Changed Everything — Except How People Use AI.

Anthropic, Google, and OpenAI all offer million-token context windows. The technology is here. But the median prompt is still under 500 tokens. The bottleneck moved from model capability to user behavior, and nobody is building for that.

By Daniel Osei, Fintech & Payments · Mar 20, 2026 · 12 min read

The context window race is over. Everyone won. Nobody cares.

Anthropic's Claude 3.5 Sonnet handles 200,000 tokens. Google's Gemini 1.5 Pro hit 1 million tokens in early 2024, then expanded to 2 million. OpenAI's GPT-4o now supports 128,000 tokens with longer-context variants in enterprise tiers. Every major frontier lab has cleared the 100K-token threshold, and the 1 million-token bar — once considered a moonshot — is now a marketing bullet point.

The capability unlock was real. In 2022, passing a 500-page PDF to a language model was impossible. Today, you can drop an entire technical documentation library into a single prompt. You can feed a model three years of earnings calls and ask it to identify strategic pivots. You can upload a full codebase and ask for a refactor. The "just dump everything in" use case that researchers dreamed about is technically available to anyone with an API key.

And the median prompt is still 400 tokens. A few sentences. A brief question. A task that could have been handled by the 2022 models.

The context window wasn't the bottleneck. It was never the bottleneck. And the AI labs, in their race to one-up each other on context length, quietly solved a problem that almost no one had.

The Numbers Nobody Is Talking About

Usage data across major AI platforms tells a consistent and uncomfortable story. According to analysis published by Andreessen Horowitz in late 2025, the median prompt length across consumer AI applications sits between 350 and 500 tokens — roughly 250 to 400 words. The 90th percentile prompt is under 2,000 tokens. The 99th percentile barely touches 10,000.

That means 99% of real-world prompts use less than 1% of the available context window in a 1 million-token model.

Enterprise usage shifts the curve but not dramatically. Internal data shared by three enterprise AI platform vendors at the 2025 AI Engineering Summit showed that even among power users — developers, analysts, legal teams doing document review — the median prompt length was under 8,000 tokens. The longest documented regular workflow, a legal discovery use case at a Fortune 500 company, averaged 42,000 tokens per session.

Impressive. Still 4% of a 1 million-token window.

Context Window Size	Available Since	Median Real-World Utilization	% of Window Used (Median)
4,096 tokens	2022 (GPT-3.5)	~350 tokens	8.5%
32,000 tokens	2023 (GPT-4 early)	~400 tokens	1.3%
128,000 tokens	2024 (GPT-4o)	~420 tokens	0.33%
200,000 tokens	2024 (Claude 3)	~450 tokens	0.23%
1,000,000 tokens	2024-2025 (Gemini 1.5)	~480 tokens	0.05%

The pattern is stark. As context windows expand, utilization rates collapse — not because users are filling more of the window, but because the window is growing faster than user behavior. The labs are building a highway. Users are still driving the same distance.

The "Whole Codebase" Use Case: Real, Niche, and Misrepresented

The canonical pitch for million-token context is the developer workflow. Drop your entire codebase into the context. Ask for a comprehensive refactor. Get architecture recommendations that account for every file, every dependency, every edge case. No more piecemeal "here is this function, what do you think?" prompting. Full-system awareness in a single conversation.

This use case is real. Developers who have tried it describe it as transformative. Google's internal data shared at I/O 2025 showed that developers using long-context Gemini for codebase analysis reported 40% faster onboarding to new repositories and a measurable reduction in bugs introduced during refactoring.

But "real" and "widely adopted" are different claims. The whole-codebase workflow requires users to think about their work differently — to conceptualize an entire codebase as a single artifact that can be handed to a model, rather than a series of discrete problems to solve file by file. That conceptual shift is not automatic. It is not intuitive for most developers. And no one is teaching it.

A survey of 1,200 professional developers conducted by Stack Overflow in Q4 2025 found that only 11% had ever submitted a prompt longer than 50,000 tokens in a work context. Of those, 68% described the workflow as "something I figured out myself" rather than something a tool or platform guided them toward. The capability is available. The on-ramp does not exist.

This is the pattern that repeats across long-context use cases. Legal professionals using AI for document review could feed entire case files into a single context. Most feed individual documents. Financial analysts could provide a decade of filings in one prompt. Most provide a quarter at a time. The tools can handle the full workload. The users never learned they could give it to them.

The Bottleneck Moved and Nobody Noticed

In 2022, the constraint on AI usefulness was genuine. Models hallucinated excessively, context windows were cramped, and retrieval-augmented generation was a clunky workaround for a real architectural limitation. The capability ceiling was low and clearly visible.

The labs fixed it. GPT-4, then Claude 2 and 3, then Gemini, pushed the capability ceiling dramatically upward. Context windows expanded by 250x in three years. Hallucination rates on factual tasks dropped substantially. The models got genuinely, measurably better.

But when the capability ceiling rose, a new bottleneck appeared: user mental models. Most people using AI tools still interact with them the way they interacted with search engines in 2010. Atomic queries. Short questions. Expecting an answer to the specific thing they asked, not a synthesis of everything they could have provided.

This is not a criticism of users. It is a product failure. The companies shipping AI tools have obsessively optimized for model capability while doing almost nothing to teach users to think in long-context workflows. There is no onboarding sequence that says "here is how to structure a 100,000-token project brief." There is no template library for multi-document synthesis prompts. There is no in-product guidance that says "you could drop your entire financial model in here."

The UX of AI products in 2026 is essentially a text box and a send button — the same interface that worked when context windows were 4,000 tokens. The interface has not evolved to reflect that the underlying capability has expanded by 250x.

Anthropic, Google, and OpenAI have published extensive technical documentation on long-context best practices. That documentation lives in developer blogs and research papers. It does not live inside the products themselves, where the 99% of non-technical users are trying to figure out what to do with this thing.

What Actually Needs to Be Built

The opportunity is not another model with a longer context window. The opportunity is tooling and UX patterns that translate long-context capability into long-context behavior.

Workflow templates. Not prompt templates — workflow templates. Structured guides that walk users through the process of collecting, organizing, and submitting the full context for a complex task. "Analyzing a contract negotiation? Here's how to structure the deal history, the parties' stated positions, and the current draft into a single context that gives Claude everything it needs."

Context builders. A layer above the chat interface that helps users assemble documents, data, and background information into a coherent context window. Something between a file uploader and a knowledge management tool. The user specifies what they are working on, the tool helps them gather the relevant material, and the assembled context goes to the model as a single structured prompt.

Progressive disclosure of capability. Most AI products show users a blank text box with infinite possibility — which is paralyzing. The products that drive long-context adoption will start with constrained, opinionated workflows that demonstrate the value of full-context reasoning, then expand user autonomy as habits form. The same logic that makes good onboarding for any product applies here.

Feedback loops that surface context gaps. If a user asks a question that would be better answered with more context, the model should say so, specifically. Not "I don't have enough information" (useless), but "This analysis would be more accurate if you included your Q3 forecast — can you add it?" This is technically straightforward today. Almost no product does it.

Capability Available Since	UX Tooling Status (2026)	User Adoption Rate
128K+ context windows	Text box, no guidance	~11% use >50K tokens
Multi-document synthesis	Manual copy-paste	~15% regularly use
Codebase-level analysis	CLI tools only (developers)	~8% of devs
Full-session memory integration	API feature, no consumer UX	<5% consumer users
Structured long-context templates	Largely absent	N/A

The table above is a product roadmap masquerading as a gap analysis. Every row is an unbuilt thing that would drive meaningful adoption of capabilities that already exist.

The Real Race Has Not Started Yet

The context window race is a solved problem. One million tokens is available. Two million is available. The architectural work is largely done, and while labs will continue expanding limits, the marginal value of going from 1 million to 2 million tokens is low when users are not using the first 990,000.

The race that matters now is behavioral. Which company can actually change how people think about working with AI? Which product will be first to move the median prompt from 400 tokens to 4,000? Which team will build the onboarding sequence that gives a mid-market CFO the intuition to hand Claude a year's worth of board presentations before asking a strategic question?

This is a harder problem than making the context window bigger. Model capability scales with compute. Behavior change scales with trust, education, and product design — all of which are slower, messier, and less legible than a benchmark score.

The labs that win the next phase of AI adoption will not be the ones with the longest context windows. They will be the ones that built the products, templates, and UX patterns that taught users how to actually use them. Right now, nobody in the industry is treating this as their primary problem. The capability teams are celebrated; the behavior change teams do not exist.

The million-token context window is sitting there, mostly empty, waiting for someone to build the product that fills it.