I Audited 8 Months of AI-Generated Code. Here's What Nobody Talks About.

AI CodingDeveloper ToolsTrend CommentaryTestingArchitecture

AI-generated code passes linters, tests, and code reviews. That was the problem. After auditing 8 months of commits across 3 production projects, the issues I found were not bugs but structural patterns that no automated tool catches.

The Audit Setup#

I reviewed roughly 4,200 commits across three TypeScript codebases where teams adopted AI coding tools (Copilot, Cursor, Claude Code) between mid-2025 and early 2026. All three projects had CI pipelines, linting, test suites, and mandatory PR reviews. On paper, the process was solid.

I was not looking for bugs. I was looking for what changed about the shape of the code after AI adoption: how file counts grew, where abstractions landed, what the test suites actually verified. The answers were consistent across all three projects.

This is not an anti-AI post. I use AI tools daily and wrote about reducing their costs. But the gap between "AI code that works" and "AI code that ages well" is wider than most teams realize.

Pattern 1: Copy-Paste Architecture#

The most visible pattern was duplication. Not copy-pasted files or obvious clones. Subtle structural duplication where the same logic existed in three or four places, each slightly different, because the AI generated a fresh solution every time instead of reusing what already existed.

GitClear's 2025 research across 211 million lines of code confirms this is industry-wide. Code duplication grew from 8.3% of changed lines in 2021 to 12.3% by 2024, while refactoring collapsed from 25% of code changes to under 10%. For the first time in the data's history, developers paste more code than they move or reuse.

In one of the projects I audited, I found 11 separate date-formatting utility functions spread across the codebase. Each one worked, and none of them knew about the others. The AI never searched for an existing function before writing a new one, because if the existing utility is not in the prompt window, it does not exist.

3 of the 11 date formatters found in one codebase

// utils/format-date.ts
export const formatDate = (d: Date) =>
  d.toLocaleDateString('en-US', { month: 'short', day: 'numeric', year: 'numeric' });

// components/invoice/helpers.ts
function prettyDate(date: string) {
  const d = new Date(date);
  return `${d.getMonth() + 1}/${d.getDate()}/${d.getFullYear()}`;
}

// lib/notifications/format.ts
const toReadable = (ts: number) =>
  new Intl.DateTimeFormat('en', { dateStyle: 'medium' }).format(new Date(ts));

Each function was generated in a different session, for a different feature, by the same AI tool. None triggered a lint warning, and all had tests that passed. The problem is invisible to every quality gate in a standard CI pipeline.

Pattern 2: The Confidence Problem#

AI-generated code looks like it was written by a senior developer: clean variable names, consistent formatting, proper TypeScript generics. This is the trap. The code reads well, so reviewers trust it faster than they should.

I kept finding implementations that were technically correct but architecturally wrong. One example: a 200-line React component that fetched data, transformed it, cached it, and rendered it, all in one file, when the project already had established patterns for each of those concerns. The AI did not violate any rule, it just ignored every convention the team had built over months.

CodeRabbit's 2025 analysis of 470 open-source pull requests found that AI co-authored code contains 1.7x more issues than human-written code, with 1.64x more maintainability errors, 1.75x more logic issues, and 2.74x more XSS vulnerabilities. These are not syntax problems. They are judgment failures dressed in clean syntax.

AI vs Human Code: Issue Multiplier by Category

The confidence problem hits hardest in code review. When a junior developer submits messy code, reviewers slow down. When AI submits polished code, reviewers pattern-match against "looks professional" and approve faster.

I talked to senior developers who are all-in on prompt-style coding. Even the most experienced ones admit they review AI code less critically than human code.

Pattern 3: Test Coverage Without Test Quality#

All three projects had test coverage above 80%, two above 90%. The coverage numbers made everyone feel safe. They should not have.

When AI generates implementation code and then generates the tests for that code, the tests are almost always tautological. They verify what the code does, not what it should do. The test becomes a mirror of the implementation instead of an independent specification of correct behavior.

AI-generated test that mirrors the implementation

// The AI wrote this function
function calculateDiscount(price: number, tier: string): number {
  if (tier === 'gold') return price * 0.8;
  if (tier === 'silver') return price * 0.9;
  return price;
}

// Then the AI wrote this test
it('applies gold discount', () => {
  expect(calculateDiscount(100, 'gold')).toBe(80);
});
it('applies silver discount', () => {
  expect(calculateDiscount(100, 'silver')).toBe(90);
});
it('returns original price for unknown tier', () => {
  expect(calculateDiscount(100, 'bronze')).toBe(100);
});

Those tests will always pass, and they will never catch a real bug. What happens when price is negative, when null sneaks in, or when the discount rates change and the test values were supposed to come from a config? The AI tested the happy path it just wrote, not the edge cases that matter in production.

GitClear's data shows that 7.9% of all newly AI-added code was revised within two weeks of its initial commit, up from 5.5% in 2020. That churn tells you the tests are not catching problems before merge. The bugs show up later, in staging or production, and by then the test suite has already blessed the code.

Tests verified the implementation's behavior, not the spec's requirements
Edge cases (nulls, negatives, empty strings, concurrent access) were consistently absent
Integration tests between AI-generated modules were rare, often zero
Mocks were over-specified, tightly coupled to internal implementation details
Test descriptions read like documentation, but the assertions were shallow

What Actually Helps#

I am not going to tell you to stop using AI coding tools. That ship sailed. The teams that came out best from my audit were the ones that changed how they used the tools, not whether they used them.

Write tests first, by hand. Let the AI generate the implementation, but write the test yourself. This forces you to think about what the code should do before the AI decides what it will do. TDD is more valuable now than it was five years ago.
Add an abstraction review gate. Before merging any AI-generated PR, ask one question: does this duplicate logic that already exists elsewhere? Tools like jscpd or custom lint rules can automate part of this, but a human eye catches the structural duplicates that token-level scanners miss.
Treat AI code as junior code. Review it like you would review a new hire's first PR. Read every line. Question every architectural decision. The clean formatting is irrelevant. Focus on whether it follows the project's existing patterns.
Shrink the context window problem. Feed the AI your project's conventions explicitly. A CONVENTIONS.md file, architectural decision records, or even a list of existing utility functions in the system prompt goes a long way. I covered practical approaches in reducing token usage.
Track duplication as a metric. If your code duplication percentage is climbing quarter over quarter, your AI workflow has a leak. GitClear, SonarQube, or even a simple jscpd report in CI will surface the trend before it becomes a rewrite.

TL;DR: AI-generated code's biggest risk is not bugs. It is structural decay that passes every automated check. Duplication replaces abstraction, confidence replaces scrutiny, and test coverage replaces test quality. The fix is not less AI. It is treating AI output with the same skepticism you would give any untrusted contributor.

The teams I audited that caught these problems early had one thing in common: they never stopped treating AI as a tool that needs supervision. The teams that struggled were the ones that let vibe coding become the production workflow. Eight months is enough time for copy-paste architecture to become the architecture.

I Audited 8 Months of AI-Generated Code. Here's What Nobody Talks About.

The Audit Setup#

Pattern 1: Copy-Paste Architecture#

Pattern 2: The Confidence Problem#

Pattern 3: Test Coverage Without Test Quality#

What Actually Helps#

Feedback

Vibe Coding Is Breaking Production (Here's How to Do It Right)

React Is Winning the AI Code Generation War