Most new software starts off easy.
Small teams can add swaths of functionality at light speed. Focus is trained on the horizon. Everything is beautiful.
After a while, there’s a pile of bugs that’s been growing. But new features are still being requested. If you’ve been working on enhancements and new features constantly, you might not see an end in sight.
Eventually, it feels like you can’t keep up. Outside stakeholders will start to notice the number of bugs. Trust in the product will start to waver. This means trust in your team will start to waver.
Now it will feel like an ever-growing scope of features and bugs. Day to day work may begin to seem like an impossible task.
At this point it’s common that test coverage is insufficient. Writing tests is almost a job in itself. Tests inevitably lag behind when building new features.
Most teams will look for options to get things back under control.
You might focus on acceptance criteria so devs understand exactly what to build. This will result in more time spent writing tickets. Longer ticket descriptions. Less surprises during code review. But not much improvement in outcomes.
The problem is that nobody is writing bugs on purpose. If acceptance criteria could fix it, well so would have code reviews.
You might plan tests upfront to improve the quality of test coverage. This can get you some vanity metrics, and still quality will suffer.
The problem is you are lacking tests because the code is hard to test. More tests are not better tests. Better tests demand refactoring, and refactoring carries risk. Especially when the existing code is unreliable.
You might implement a uniform testing methodology, like TDD. But this will suffer from the same complications as upfront test planning.
Enforcing minimum test coverage falls short in the same way, too.
Adding a common code review checklist will be tempting. The problem will be applying the checklist effectively. One-size-fits all approaches won’t capture the unique shape of your tech debt.
You might consider changing tooling to something that reduces the possibility of errors. It’s a huge change and it’d be a hard sell to the team. But you’re getting warmer…
Eventually you’ll want to ask for a hold on all new features until you can fix all the existing bugs. Enough firefighting will make everyone want to stabilize the system.
But the bugs you observe are effects, not causes. And solving the causes needs a deep dive into the foundations you’ve been building on top of for some time now.
If you’ve been here a while, you have made tracks on this codebase. You might become a go to person when things go wrong.
You may be constantly stuck in “emergency” huddles with devs who aren’t your direct reports. Their managers will push them to you because “who else can do it?”
Some folks will dream about quitting. People will lose sleep at night stressing the bugs they trip over, or the fires they get roped into fighting.
Teams lose weeks on bugs they can’t reproduce. They don’t have enough logging. They didn’t write any of the code that they’re trying to diagnose. There’s little to no documentation at all.
There is a single answer to these problems, and it’s not what you’d think.
You need functional programming.