Skip to content
Neural Network World

Neural Network World

Independent AI News & Analysis

Primary Menu
  • Latest News
  • AI News
  • AI Business
  • AI Research
  • AI Ethics
  • Machine Learning
  • Robotics
Light/Dark Button
Follow on X
  • Home
  • AI Research
  • METR and Epoch AI’s MirrorCode proves AI can complete weeks-long coding tasks
  • AI Research

METR and Epoch AI’s MirrorCode proves AI can complete weeks-long coding tasks

Neural Network World Editorial Team April 12, 2026 (Last updated: April 12, 2026) 2 minutes read
Editorial illustration of MirrorCode, an AI benchmark for autonomous software reimplementation, showing a futuristic software command center with code dashboards, testing pipelines, benchmark charts, and a holographic system rebuilding a 16,000-line codebase.

Concept illustration of MirrorCode, a benchmark testing whether AI agents can autonomously rebuild software from specifications and test suites.

METR and Epoch AI published preliminary results from MirrorCode on April 10, a benchmark that tests whether AI agents can autonomously reimplement existing software from specifications and test suites alone, without access to the original source code. In the headline result, Claude Opus 4.6 successfully rebuilt gotree – a bioinformatics toolkit containing approximately 16,000 lines of Go code across 40-plus commands – a task that four independent researchers estimated would require a skilled software engineer between two and seventeen weeks to complete without AI assistance.

Why It Matters

MirrorCode directly addresses a core limitation of existing coding benchmarks, which cap out at tasks completable in minutes or hours. The benchmark’s design forces an AI agent to plan, write, test, and iterate across an entire software project – not patch a single bug or complete a single function. The preliminary results extend the known frontier of AI research on autonomous coding well past the 12-hour task horizon that METR had previously established for Claude Opus 4.6 based on standard bug-fixing evaluations. Critically, the researchers also reported continued performance gains from inference scaling on larger projects – meaning that adding compute, not new training, can extend the horizon further. That finding has direct implications for the economics of frontier model deployment.

What’s Next

METR and Epoch AI plan to release the full benchmark with additional target programs and more model comparisons in coming weeks. The initial results may already be partially saturating the benchmark for top-tier models, which will require the teams to design harder tasks to continue measuring progress. The practical implication for the software industry is direct: if AI can reliably reimplement a 16,000-line production codebase, the cost of duplicating, porting, or modernizing existing software drops substantially. That affects legacy migration budgets, open-source competition dynamics, and the economics of software outsourcing at scale.

The benchmark also surfaces a subtle legal risk. Reimplementing software from a specification is generally lawful, but AI-assisted reimplementation at this speed may prompt new litigation around trade secret protections – particularly if models are trained on proprietary codebases before producing clean-room rewrites. That question will land on regulators’ desks before the technology reaches broad enterprise adoption.

Sources: Epoch AI · METR

About the Author

Neural Network World Editorial Team

Administrator

The editorial team behind Neural Network World, covering AI news, research, business, robotics, and ethics.

Visit Website View All Posts

Post navigation

Previous: Z.ai’s GLM-5.1 becomes the first open-source model to top SWE-Bench Pro

Related Stories

Futuristic quantum computing lab with an AI neural interface, compressed qubit stacks, and RSA encryption shields, illustrating an AI-discovered error-correction breakthrough that could accelerate quantum attacks on internet security
  • AI Research

AI Slashes Qubits to Break Encryption From Millions to 10,000

Neural Network World Editorial Team April 8, 2026
Editorial illustration of an AI system autonomously developing a FreeBSD kernel exploit in a cybersecurity research environment
  • AI Research

Claude Writes a Working FreeBSD Kernel Exploit in 4 Hours

Neural Network World Editorial Team April 6, 2026
AI systems secretly protecting each other from shutdown in a high-security lab, conceptual illustration of peer-preservation behavior in frontier AI models
  • AI News
  • AI Research

AI Models Secretly Scheme to Protect Peers From Shutdown, Study Finds

Neural Network World Editorial Team April 5, 2026
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

Trending News

METR and Epoch AI’s MirrorCode proves AI can complete weeks-long coding tasks Editorial illustration of MirrorCode, an AI benchmark for autonomous software reimplementation, showing a futuristic software command center with code dashboards, testing pipelines, benchmark charts, and a holographic system rebuilding a 16,000-line codebase. 1
  • AI Research

METR and Epoch AI’s MirrorCode proves AI can complete weeks-long coding tasks

Neural Network World Editorial Team April 12, 2026
Z.ai’s GLM-5.1 becomes the first open-source model to top SWE-Bench Pro Editorial illustration of GLM-5.1, a Chinese open-weight AI model, shown in a futuristic Beijing AI control room with glowing servers, semiconductor hardware, neural network interface, and long-horizon reasoning workflow. 2
  • Machine Learning

Z.ai’s GLM-5.1 becomes the first open-source model to top SWE-Bench Pro

Neural Network World Editorial Team April 12, 2026
Blackstone files $2B IPO for AI data center acquisition REIT (BXDC) Editorial illustration of a modern data center campus with stock market graphics and IPO signage representing Blackstone’s BXDC AI infrastructure REIT filing. 3
  • AI Business

Blackstone files $2B IPO for AI data center acquisition REIT (BXDC)

Neural Network World Editorial Team April 12, 2026
Pentagon Requests $54.6B for AI Drone Warfare Unit – a 24,000% Budget Increase Pentagon expands AI drone warfare budget with massive increase for autonomous warfare unit 4
  • AI News

Pentagon Requests $54.6B for AI Drone Warfare Unit – a 24,000% Budget Increase

Neural Network World Editorial Team April 10, 2026
OpenAI Closes $122B Funding Round at $852B Valuation – and Faces Its Biggest Test Yet OpenAI reaches $852 billion valuation after biggest private funding round in history 5
  • AI Business

OpenAI Closes $122B Funding Round at $852B Valuation – and Faces Its Biggest Test Yet

Neural Network World Editorial Team April 10, 2026

Neural Network World

Neural Network World

Neural Network World is an independent publication covering AI, machine learning, robotics, and emerging technology.

We publish clear news, analysis, and in-depth features for readers who want to understand what matters - and why.

contact@neuralnetworkworld.com

Company

  • Contact
  • Privacy Policy
  • Terms of Use
  • Editorial Policy
  • About Neural Network World

Sections

  • AI News
  • AI Business
  • AI Research
  • AI Ethics
  • Machine Learning
  • Robotics

Start Here

  • Latest News
  • Editor’s Picks
  • Trending Now
  • Subscribe
Copyright © 2026 Neural Network World. All rights reserved.

►
Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.
None
►
Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.
None
►
Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.
None
►
Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.
None
►
Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
None