Series: AI

AI Metrics - June 12, 2026

AI Metrics

Many software development metrics have always been rather susceptible to misuse. Source Lines of Code is a fun metric that too easily becomes the target with Goodhart’s Law:

When a measure becomes a target, it ceases to be a good measure

At its worst, lines of code as a measurement becomes performative and a hindrance. A few months ago when scrolling LinkedIn (a poor choice I know), I saw many individuals boasting about lines of code they wrote each day with AI. Checking again this week, I see fewer of those lines of code boasting posts but I perhaps see more boasting about their autonomous AI agents writing products. That must largely be products that’ll never see the light of day, but at least they’re getting closer to boasting about the house being built than the amount of lumber being used. I know citing LinkedIn anecdotally as a source is almost meaningless but it’s one of the windows I see.

In the late winter/early spring of this year, several companies published blog posts or by some other avenue that they had developers at 100% AI code. The report most interesting to me being of Spotify where the example given is a developer working during their commute entirely from their phone. Spotify’s example is the most interesting to me because they later clarified that they started the journey well before AI.

What if we used automation to make changes across hundreds or even thousands of software components at once? That idea became Fleet Management, and the underlying system we built to execute it is called Fleetshift. Fleet Management has been running at Spotify for several years now.

Deploying quickly and often isn’t new. DORA metrics have been around for several years and have been a mostly useful measurement.

Deployment Frequency - How often an organization successfully releases to production

Lead Time for Changes - The amount of time it takes a commit to get into production

Change Failure Rate - The percentage of deployments causing a failure in production

Time to Restore Service - How long it takes an organization to recover from a failure in production

Teams and organizations that do well in these metrics probably have a large amount of automation to minimize human error and latency in their deployment process. Having more confident CI/CD has set them up for success even before AI agents loosens the limitations of typing speed.

On the other hand, I feel bold enough to say that teams with unreliable verification and deployment before AI Agents won’t see nearly the gains from AI agents as they might hope for. The limiting factor of confidence in the software changes might be offset by a high tolerance for change failure, so perhaps those teams might see a large gain from AI Agents if the only measurement is number of changes.

Published June 12, 2026.

AI Metrics Work

Previous: Bevy gamedev part 1: ships and physics

Series: AI

AI Metrics - June 12, 2026

OLIVERCODING

Series: AI

AI Metrics

Previous: Bevy gamedev part 1: ships and physics

Previous: Bevy gamedev part 1: ships and physics

Series: AI

OLIVER
CODING