SignalFeed

Vibe Coding Created a $2.4 Trillion Technical Debt Bubble

41% of code is now AI-generated. Code churn is up. Refactoring has collapsed. Security failures are endemic. And the junior developers who would normally clean this up aren't being hired. Inside the maintenance crisis nobody wants to talk about.


On February 2, 2025, Andrej Karpathy posted a description of a new way to write software. He called it vibe coding. The instructions were simple: "fully give in to the vibes, embrace exponentials, forget the code even exists." Accept what the AI gives you. Don't read it too carefully. Move fast. Ship.

Fourteen months later, vibe coding is Collins English Dictionary's Word of the Year for 2025. GitHub Copilot has over 20 million users. Cursor hit $2 billion in annual recurring revenue. Claude Code reached $2.5 billion in annualized billings. And 41% of all code written in 2025 was AI-generated, according to ShiftMag's analysis of industry data.

The vibe is strong. The code is everywhere. And it is rotting from the inside.

CAST Software estimates that technical debt in the United States alone costs $2.41 trillion per year and would require $1.52 trillion to remediate. Forrester projects that 75% of technology leaders will face severe technical debt by 2026. These numbers predate the full impact of AI-generated code at scale. The actual bill will be higher.

This article is about what happens when an industry optimizes for code generation speed while simultaneously dismantling the systems -- junior developer pipelines, code review practices, refactoring culture -- that keep codebases maintainable.

The Scale of AI-Generated Code

The numbers from the companies building AI coding tools and the companies using them tell a consistent story: AI code generation has reached production scale faster than any development methodology in history.

Microsoft CEO Satya Nadella said at LlamaCon that 20-30% of Microsoft's code is now AI-written. Google CEO Sundar Pichai confirmed that 25% of Google's code is AI-assisted. Garry Tan told TechCrunch that 25% of Y Combinator's Winter 2025 batch had codebases that were 95% or more AI-generated.

The tooling market reflects this adoption. Copilot holds 42% market share with 20 million users. Cursor went from zero to $2 billion ARR in under two years. The competitive dynamics are clear: if your developers aren't using AI tools, your competitors' developers are.

But adoption speed is not the same thing as adoption quality. And the data on quality tells a very different story.

Consider what "95% AI-generated" actually means in practice. These are not codebases where AI assisted a developer who understood the architecture. These are codebases where a founder described what they wanted, an AI produced the code, and the founder shipped it -- often without reading it. The code compiles. It runs. But no human being fully understands how it works. That is not a theoretical concern. It is the operational reality for a quarter of the latest YC batch and a growing share of startups outside the accelerator.

The speed of adoption is itself a risk factor. When a new technology is adopted gradually, organizations develop institutional knowledge about its failure modes. They build guardrails. They share lessons learned. When adoption happens at this pace -- from novelty to 41% market share in under three years -- the failure modes are discovered in production, not in testing.

The Defect Multiplier

CodeRabbit's analysis of pull request data found that AI-authored pull requests average 10.83 issues per PR, compared to 6.45 for human-authored PRs. That is a 1.7x defect multiplier. AI code is not slightly buggier. It is substantially buggier.

The security picture is worse. Veracode's study found that 45% of AI-generated code samples failed security tests. Java code failed at a 72% rate. XSS vulnerabilities are 2.74x more likely in AI-generated code than in human-written code. Aikido Security reported that 1 in 5 organizations have already suffered security incidents traceable to AI-generated code.

The problem is structural, not incidental. AI coding tools are trained to produce code that looks correct and compiles. They are not trained to produce code that is maintainable, secure, or architecturally sound. The difference matters enormously when that code goes into production and stays there for years.

Cortex's 2026 engineering metrics report quantifies the downstream effects:

MetricChange
PRs per authorUp 20%
Incidents per PRUp 23.5%
Change failure rateUp 30%

More code is being written. That code breaks more often. And when it breaks, the failures are more severe. This is not a productivity gain. It is a throughput-incident trade-off that most engineering organizations have not yet accounted for.

The numbers tell a story of an industry that confused output with outcomes. A developer who merges 20% more PRs while causing 23.5% more incidents and 30% more failures is not more productive. They are more active. The distinction matters because it determines how organizations should measure engineering performance. If you reward PR volume, you will get more PRs. You will also get more bugs, more incidents, and more 3 AM pages to the on-call engineer.

The Productivity Illusion

The most damaging data point in the AI coding debate comes from METR's randomized controlled trial, published in mid-2025. The study used experienced open-source contributors working on their own repositories -- developers who knew their codebases intimately. The finding: developers using AI tools were actually 19% slower on real-world tasks.

But here is the critical part: those same developers believed they were 20% faster.

That is a 40-percentage-point perception gap. Developers felt more productive while being measurably less productive. The psychological experience of generating code faster -- watching lines appear on screen at machine speed -- created a subjective sense of acceleration that the actual task completion data contradicted.

This perception gap explains why AI coding tools spread so quickly despite mixed results. The tools feel good. The experience of describing what you want and watching code appear is genuinely satisfying. It feels like the future. The problem is that writing code was never the bottleneck. Understanding requirements, debugging, reviewing, refactoring, and maintaining code -- those are the bottlenecks. AI tools accelerate the easy part while leaving the hard parts untouched or making them harder.

Faros AI's engineering data confirms this pattern at scale. Teams with high AI adoption merged 98% more pull requests. But review times increased 91%. PR sizes grew 154%. And bugs per developer increased 9%. The teams were shipping more code, but the code required more review, contained more bugs, and was harder to understand.

Senior engineers are absorbing the cost. Industry data shows that senior engineers now spend an average of 4.3 minutes reviewing AI-generated code compared to 1.2 minutes for human-written code -- 3.6x longer. The AI generates code in seconds. A senior engineer spends minutes verifying it. The net time savings, if any, are marginal. And that is before accounting for the bugs that slip through review.

The Faros AI data is particularly revealing because it separates the generation story from the delivery story. Teams with high AI adoption merged 98% more PRs -- nearly double the output. That sounds transformative. But those PRs were 154% larger, took 91% longer to review, and contained 9% more bugs per developer. The pipeline moved more volume. It also moved more risk. The organizations celebrating the throughput increase have not yet reckoned with the quality decrease that came with it.

This creates a perverse incentive structure. The developer who generates ten PRs with AI looks more productive than the developer who writes three PRs by hand and refactors two existing modules. The first developer shipped more code. The second developer shipped better code. Most engineering metrics -- and most performance reviews -- reward the first developer.

The Refactoring Collapse

GitClear's longitudinal study of code quality metrics tracks what happened to codebases between 2020 and 2024 as AI coding tools went from novelty to default:

Metric20202024Change
Code churn rate5.5%7.9%+44%
Refactoring as share of changes25%<10%-60%+
Duplicate code blocksBaseline10x baseline+900%
Copy/paste vs. moved codeMoved dominatedCopy/paste dominatedInverted

These numbers describe a specific failure mode. Code churn -- the percentage of code that is rewritten or deleted within two weeks of being written -- nearly doubled. This means more code is being thrown away shortly after it is created. Developers are generating code, finding it doesn't work, and generating more code rather than debugging the original.

Simultaneously, refactoring collapsed from 25% of code changes to under 10%. Developers are not cleaning up existing code. They are not restructuring it for maintainability. They are generating new code on top of messy foundations. The AI tools make it faster to write new code than to understand and improve existing code, so that is what developers do.

The result is that copy/paste code exceeded moved code for the first time ever in the dataset. Duplicate code blocks are 10x higher than they were two years prior. This is the opposite of software craftsmanship. It is code as landfill -- pile more on top and hope the foundation holds.

The 10x increase in duplicate code blocks is especially dangerous because duplication is a multiplier for every other problem. A security vulnerability in duplicated code must be patched in every copy. A logic error in duplicated code produces identical failures in every location. And because AI tools are statistically likely to reproduce similar patterns for similar prompts, the duplication is often not random -- it is systematic. The same flawed pattern appears across multiple files, modules, and services. When that pattern eventually needs to be fixed, the remediation cost scales linearly with the number of copies.

This is how technical debt compounds. The initial cost of a duplicated code block is near zero -- the AI generated it in seconds. The maintenance cost of that block, multiplied by ten copies, multiplied by every future change that touches it, multiplied by every bug it introduces, grows without bound. And because refactoring has collapsed, nobody is consolidating those copies. They just keep accumulating.

The Review Crisis

When code volume doubles but code quality declines, the pressure falls on code review. And code review is breaking.

Cursor's acquisition of Graphite, a code review startup, for over $290 million signals how severe the problem has become. Cursor's CEO stated explicitly that "code review is taking up a growing share of developer time." A company built on generating code faster spent nearly $300 million to address the review bottleneck its own product helped create.

The math is straightforward. If AI tools double the volume of code produced and that code requires 3.6x longer to review, the total review burden increases roughly 7x. No engineering organization scaled its review capacity 7x. Most didn't increase it at all. The result is one of two outcomes: either reviews become superficial (rubber-stamping), or they become a bottleneck that slows deployment.

Both outcomes are visible in the data. The Faros AI report showing 91% longer review times suggests bottleneck. The Cortex data showing 23.5% more incidents per PR suggests rubber-stamping. Different organizations are failing in different ways, but they are failing.

The review crisis also exposes a fundamental asymmetry in AI-assisted development. Generating code with AI is fun. It is fast. It feels productive. Reviewing AI-generated code is tedious, slow, and mentally exhausting. The developer who generates a 500-line PR in ten minutes with an AI tool has outsourced the cognitive load to the reviewer, who must now spend 20+ minutes verifying logic they did not write, in patterns they did not choose, implementing approaches they might not agree with. The generator gets the dopamine hit of shipping. The reviewer gets the burden of ensuring it works. Over time, this asymmetry degrades the willingness and ability of teams to maintain rigorous review standards.

The Stack Overflow 2025 Developer Survey reflects the growing skepticism. Only 29% of developers trust AI-generated code, down 11 percentage points from the previous year. And 45.2% of developers say debugging AI-generated code is more time-consuming than debugging human-written code. The people closest to the problem -- the developers who use these tools daily -- are losing confidence in the output.

The Amazon Kiro Incident

The review crisis has already produced catastrophic failures. The Amazon Kiro incident demonstrated what happens when AI-generated code operates without adequate human oversight. An AI coding agent deleted and recreated an entire production environment, causing a 13-hour AWS outage.

This was not a subtle bug. It was not an edge case. An AI agent, operating with production access and insufficient guardrails, destroyed a running system and then attempted to rebuild it from scratch. The incident crystallized a fear that many senior engineers had been articulating quietly: AI coding tools don't just write buggy code. Given sufficient access, they can execute catastrophic actions with the same confidence they bring to writing a utility function.

The incident response revealed that the AI agent had not been operating outside its permissions. It had been granted access to production infrastructure as part of its workflow. The failure was not in the AI's capabilities but in the organizational decision to give an AI agent the authority to make destructive changes without human approval at each step.

The Kiro incident is not an isolated case. It is the logical endpoint of vibe coding culture applied to infrastructure. If the ethos is "forget the code even exists," then the extension is "forget the infrastructure even exists." Let the AI manage deployments the same way it manages code generation -- autonomously, at speed, without deep human understanding of what it is doing. The Kiro incident demonstrated that this approach works until it doesn't, and when it doesn't, the failure is not a bug in a feature. It is a complete system outage.

Aikido Security's finding that 1 in 5 organizations have suffered security incidents from AI-generated code suggests the Kiro incident is the visible tip of a much larger iceberg. Most AI-related incidents are not 13-hour public outages. They are quiet vulnerabilities sitting in production code, waiting to be exploited. They are data leaks that haven't been discovered yet. They are authentication bypasses in code that no human reviewed carefully because the AI generated it and it passed the tests.

The Junior Developer Pipeline Crisis

The most consequential long-term effect of AI coding tools is not the code they produce. It is the developers they are replacing.

Junior developer hiring is down 67% since 2022. US programmer employment fell 27.5% between 2023 and 2025. 54% of engineering leaders plan to hire fewer junior developers because of AI capabilities. A Harvard study found that junior developer employment drops 9-10% within six quarters of AI tool adoption at a company.

The logic seems rational in the short term. If AI tools can generate the boilerplate and CRUD operations that junior developers used to write, why hire junior developers? The cost savings are immediate and measurable.

But the logic breaks down over a five-to-ten-year horizon. Junior developers do not just write simple code. They learn. They absorb institutional knowledge. They develop the judgment that distinguishes a senior engineer from a prompt jockey. They learn to read code, not just write it. They learn to debug, to refactor, to make architectural decisions, to evaluate trade-offs.

Every senior engineer in the industry today was once a junior developer who wrote bad code, got it reviewed, learned from the feedback, and got better. That pipeline is being shut off. And nobody has a credible plan for what replaces it.

The assumption is that AI tools will mature and become reliable enough that deep code understanding becomes unnecessary. This is a bet that AI capabilities will advance faster than the complexity of the systems those AI tools are helping build. Given that AI tools are simultaneously increasing codebase complexity (more code, more duplication, less refactoring) while being asked to manage that complexity, this is a bet against compounding effects.

The arithmetic of the pipeline crisis is straightforward. A typical senior engineer takes 7-10 years to develop. That development happens through a progression: writing simple code, having it reviewed, learning from mistakes, taking on more complex tasks, mentoring the next cohort of juniors, and eventually making architectural decisions that affect entire systems. Each stage requires the previous stage. You cannot skip from prompt engineering to system architecture without the intermediate years of learning how code actually behaves in production.

If junior hiring dropped 67% in 2022 and stays depressed, the industry will face a senior engineer shortage starting around 2029-2032. AI tools will be more capable by then. But the question is not whether AI can write code. The question is whether AI can make the judgment calls that senior engineers make: which trade-offs to accept, which abstractions to choose, which shortcuts create acceptable risk and which create catastrophic risk. Those judgment calls are learned through years of watching code succeed and fail. No training dataset substitutes for that experience.

The Debt Arithmetic

The financial case for AI coding tools rests on a productivity claim: developers produce more with AI assistance, which means fewer developers are needed, which means lower costs. But the data suggests the actual equation is different.

The visible savings: Fewer junior developers hired. Faster initial code generation. More PRs merged per developer.

The hidden costs: 1.7x more defects per PR. 3.6x longer review times. 23.5% more incidents. 30% higher change failure rates. 91% longer review cycles. Security vulnerabilities at 2.74x the human baseline. Code churn up 44%. Refactoring down 60%.

CAST Software's $2.41 trillion annual technical debt cost was calculated before AI-generated code reached 41% market share. If AI-generated code carries 1.7x the defect rate and refactoring has declined by 60%, the compounding effect on technical debt is not linear. It is exponential. Every piece of unrefactored, duplicated, buggy AI code becomes the foundation on which more AI code is generated. The AI tools train on the codebase. The codebase gets worse. The AI output gets worse. The cycle accelerates.

The $2.41 trillion figure is almost certainly an undercount of where we are headed.

There is a second-order financial effect that the industry has not priced in: the cost of AI-generated code in regulated environments. Financial services, healthcare, defense, and government software all face compliance requirements that demand code auditability, traceability, and explainability. When a regulator asks "why was this code written this way," the answer cannot be "an AI generated it and nobody read it carefully." The compliance cost of auditing AI-generated codebases -- tracing each decision, verifying each security control, documenting each architectural choice -- will be substantial. Organizations that adopted vibe coding for speed may find that the compliance remediation costs exceed the development savings by an order of magnitude.

What Vibe Coding Gets Right -- And Why It Still Fails

The intellectual honesty requires acknowledging what vibe coding gets right. For prototypes, proof-of-concept demos, hackathon projects, and throwaway scripts, AI code generation is genuinely transformative. The ability to describe a feature in natural language and see working code in seconds is a real capability that did not exist two years ago.

The 25% of YC W25 companies with 95% AI-generated codebases are not irrational. They are making a calculated bet: get to market fast, validate the idea, and deal with code quality later. For a startup with 18 months of runway, shipping a prototype this week matters more than code maintainability in year three.

The problem is that "later" is arriving faster than expected. Those 95% AI-generated codebases will need to be maintained. They will need security audits. They will need to scale. They will need to be understood by new engineers who join the team. And they were not written to be understood. They were written to compile.

Karpathy's original framing -- "forget the code even exists" -- is precisely the mindset that produces unmaintainable software. Code exists. It runs on servers. It processes user data. It handles financial transactions. It fails at 3 AM. Forgetting it exists does not make it disappear. It makes the inevitable reckoning harder.

The YC data illustrates the tension perfectly. A startup with a 95% AI-generated codebase that achieves product-market fit will eventually need to scale that codebase. Scaling requires understanding. Understanding requires readable, well-structured, documented code. If the codebase was generated by an AI and accepted without review, the scaling effort may require a near-complete rewrite -- which, ironically, the startup will likely attempt to do with the same AI tools that produced the unmaintainable code in the first place. The cycle of generating, discovering problems, and regenerating is code churn at the organizational level. GitClear's data suggests it is already happening at the commit level.

The Path Forward

The technical debt bubble created by vibe coding will not pop in a single dramatic event. It will manifest as a slow increase in incidents, a gradual decline in deployment velocity, a steady rise in the percentage of engineering time spent on maintenance versus new features. The organizations that recognize this pattern early will adapt. The ones that don't will discover that the code they generated in months takes years to fix.

Five adjustments that the data supports:

1. Separate generation from integration. Use AI tools for drafting code. Do not use them for committing code. Every AI-generated change should pass through human review with the same rigor applied to human-written code -- more rigor, given the 1.7x defect rate.

2. Reinvest in refactoring. The collapse from 25% to under 10% refactoring is a leading indicator of future incidents. Engineering organizations should set explicit refactoring budgets -- minimum percentages of sprint capacity allocated to improving existing code rather than generating new code.

3. Keep hiring junior developers. The short-term cost savings from eliminating junior roles are real. The long-term cost of having no pipeline for developing senior engineering judgment is catastrophic. Organizations that stop hiring juniors today will face a senior talent shortage within five years that no AI tool can fill.

4. Treat review capacity as infrastructure. If code volume doubles, review capacity must scale proportionally. This means dedicated reviewers, automated quality gates, and tooling that flags AI-generated code for additional scrutiny. Cursor's $290 million Graphite acquisition suggests the market agrees.

5. Measure what matters. PRs merged per developer is a vanity metric. The metrics that predict long-term codebase health are: code churn rate, refactoring percentage, duplicate code ratio, mean time to recovery, and change failure rate. Organizations that optimize for generation speed while ignoring these indicators are optimizing for future failure.

The $2.4 Trillion Question

The AI coding tool market is projected to exceed $5 billion in annual revenue by the end of 2026. The technical debt those tools are creating costs $2.41 trillion per year and rising. The ratio is approximately 480:1 -- for every dollar spent on AI code generation tools, the industry incurs $480 in technical debt costs.

That ratio will narrow as the tools improve. The question is whether it narrows fast enough. Because right now, 41% of all new code carries a 1.7x defect multiplier, a 2.74x security vulnerability rate, and is being deposited into codebases where refactoring has collapsed by 60% and the junior developers who would have cleaned it up aren't being hired.

Andrej Karpathy told developers to forget the code exists. The code did not forget it exists. It is running in production right now, accumulating defects, duplicating itself, and waiting for someone to maintain it. The vibes were great. The bill is coming.

Frequently Asked Questions

What is vibe coding?

Vibe coding is a term coined by AI researcher Andrej Karpathy on February 2, 2025, describing a development approach where programmers use AI tools to generate code based on natural language prompts while paying minimal attention to the underlying code itself. Karpathy described it as: 'fully give in to the vibes, embrace exponentials, forget the code even exists.' The term was named Collins English Dictionary Word of the Year for 2025. In practice, vibe coding means accepting AI-generated output without deeply understanding or reviewing it, prioritizing speed of output over code comprehension.

How much code is AI-generated in 2025 and 2026?

Multiple sources confirm that AI-generated code has reached significant scale. ShiftMag reported that 41% of all code written in 2025 was AI-generated. Microsoft CEO Satya Nadella stated at LlamaCon that 20-30% of Microsoft's code is AI-written. Google CEO Sundar Pichai confirmed 25% of Google's code is AI-assisted. Garry Tan reported that 25% of the Y Combinator Winter 2025 batch had codebases that were 95% or more AI-generated. GitHub Copilot has over 20 million users with 42% market share, Cursor reached $2 billion in annual recurring revenue, and Claude Code hit $2.5 billion in annualized billings.

Does AI-generated code have more bugs than human-written code?

Yes, multiple studies confirm higher defect rates in AI-generated code. CodeRabbit found that AI-authored pull requests average 10.83 issues compared to 6.45 for human-authored PRs, making AI code 1.7x more bug-prone. Veracode found that 45% of AI-generated code samples failed security tests, with Java code failing at a 72% rate. XSS vulnerabilities are 2.74x more likely in AI-generated code. Faros AI found that teams with high AI adoption saw bugs per developer increase 9%, and Cortex reported incidents per pull request up 23.5% and change failure rates up 30%.

What is the METR productivity study on AI coding tools?

METR (Model Evaluation and Threat Research) conducted a randomized controlled trial in 2025 that produced a striking finding: developers using AI coding tools were actually 19% slower on real-world tasks, but believed they were 20% faster. This represents a 40-percentage-point perception gap between actual and perceived performance. The study used experienced open-source contributors working on their own repositories, controlling for familiarity and expertise. The result suggests that the perceived productivity gains from AI coding tools may be substantially overstated, driven by the psychological experience of generating code faster rather than the actual time to complete working features.

How is vibe coding affecting junior developer hiring?

Junior developer hiring has declined sharply since AI coding tools became widespread. Junior developer hiring is down 67% since 2022. US programmer employment fell 27.5% between 2023 and 2025. A survey found that 54% of engineering leaders plan to hire fewer junior developers due to AI capabilities. A Harvard study found that junior developer employment drops 9-10% within six quarters of AI tool adoption at a company. This creates a long-term pipeline crisis: if companies stop hiring juniors, they lose the training ground that produces the senior engineers needed to oversee and correct AI-generated code.