METR Chart Reveals Rapid Doubling of AI Task-Completion Abilities
METR's latest chart shows leading AI models now double their task-completion span every 3.5 months.
Why it matters: Legal tech and in-house counsel now face faster-changing benchmarks for AI’s practical capabilities, directly impacting technology procurement, risk assessment, and deployment in legal workflows. Understanding these shifts is critical for aligning investments with AI’s real-world business value and regulatory compliance.
- METR defines the 'time horizon' as the longest duration of tasks AI can complete with expert human reliability.
- Since 2024, frontier models’ time horizons have doubled every 3.5 months, up from a 7-month pace since 2019.
- Anthropic's Claude Opus 4.6 currently leads, completing tasks lasting up to 14.5 hours with 50% reliability as of February 2026.
- Analysts warn that the metric tracks technical task-completion, not full operational or business value.
San Francisco Bay Area nonprofit METR is influencing how legal and tech professionals benchmark AI progress. Its widely cited chart tracks how long—measured in hours—today’s frontier AI models can perform set tasks to expert standards, with "time horizon" meaning the longest continuous task an AI can reliably complete at or above the level of a skilled professional.
- The key metric helps legal tech buyers anticipate when AI tools might meaningfully automate tasks such as document review, contract analysis, or due diligence.
- Under METR’s measures, major systems saw their time horizons double every seven months from 2019-2024. That acceleration quickened sharply in 2024: models now double their capability roughly every 3.5 months, according to a LessWrong analysis and METR’s publicly shared data.
- Anthropic’s Claude Opus 4.6 currently tops the chart, handling tasks of up to 14.5 hours at a 50% task-completion rate as of February 2026, and maintaining over 80% reliability on 1-hour tasks.
For legal teams, the metric clarifies how quickly AI’s technical limits are advancing, informing upgrade and deployment cycles for AI-powered document review or research platforms.
However, reliability of METR’s metric is debated. The chart’s focus is on isolated, "agentic" tasks—multi-step assignments performed without further human input. Some argue this puts less weight on practical or cross-domain value. Critiques from analysts at The Biggish and a peer-reviewed paper by Haosen Ge et al. (arXiv) caution that not all measures of AI progress grow at exponential rates, and charted "time horizon" may not always reflect business or regulatory realities.
Bottom line for legal professionals: METR’s chart signals rapidly advancing task-level AI—but translating these advances to operational value, compliance, and risk mitigation demands broader perspective than technical metrics alone provide.
By the numbers:
- 3.5 months — Current rate at which leading AI models double their task-completion time horizon since 2024.
- 14.5 hours — Maximum task duration achieved at 50% reliability by Claude Opus 4.6 as of Feb 2026.
- 2x acceleration — 2024 rate is twice as fast as the previous 7-month doubling pace (2019-2024).
Yes, but: The chart measures technical task-completion, not wider business or regulatory utility—a gap noted by both industry analysts and academics.