“We need to rewrite this system from scratch”
I heard that statement a few years ago at a startup I was working with. I’m sure you’ve heard that before. Perhaps you have said it yourself. This statement always has excellent reasons behind it.
Rewriting something that is live and works is a scary prospect. It’s a lot like rebuilding an airplane while it’s in flight. Software engineers invariably introduce tech debt as changes and additions are made to the system.
At some point the tech debt seems unsurmountable. The team feels like it’s spending more time battling the system architecture rather than releasing value. The symptom is true. The reason, however, always varies and it’s important to get to the root cause.
I wanted to understand the issue and its severity and got them together.
The first session
The first session was to really understand the WHY behind it. I am a huge fan of the 5 Whys system. It tries to find the root cause of a problem by repeatedly asking the question “Why?”
During the first session, I listened a lot and tried to understand their concerns. The session itself was done in a way one would do end of sprint retrospectives, with everybody writing stuff down on post-its and later explaining their post-it notes.
I asked a bunch of qualifying questions when the reasons seemed vague and towards the end I was able to validate the following,
There were areas of the codebase that the team was uncomfortable with. Changes to those areas had historically caused issues elsewhere — from a broken build to severe data corruption downstream.
This was due to a few factors,
- Knowledge attrition — The business had been restructured a few months ago and with that, a lot of the knowledge had left the business.
- Poor ownership — Some parts of the code had no current owner. Teams owned part of the codebase and there were gaps of no ownership
- No testing — My biggest bugbear, the code lacked testing of any kind.
The good news is that all of this was fixable.
The second session
I set up the second session the following week. Before the session, I worked with a few team members on how we could best structure the teams so that there was complete ownership of code.
During the second session, I acknowledged that most people were new to the codebase and that there were areas that were poorly understood. I introduced the new team structure which created owners for all parts of the code.
On a parallel track, service design workshops were set up to understand what the current system did and what was the expected outcome for the business. This would then feed back into the development teams to help them fill in that knowledge gap.
The most important thing however was to improve test coverage. This would do two things,
- Bring confidence in our development and release process and
- Most importantly, improve knowledge within the teams through writing tests.
Until we achieved a test coverage we were happy with, every sprint would have 20% time set aside for each developer to add test coverage in their areas. I also set up a series of sessions to introduce TDD and Testing for people who weren’t familiar with it.
The third session
In this session, I wanted to set the stage for the team to work on improving the existing platform. Rather than build a parallel platform which would mean having two platforms to support and the feature disparity between the two, I believe the right way is to chip away at the current platform and replace existing bits with newer ones.
The service design workshops identified a bunch of areas that needed improvement or were bottlenecks. This was both from a business and a tech perspective. We created a prioritised list based on impact v/s difficulty and this would feed our backlog to release the most business value in the coming weeks.
We also identified areas which were working well and wouldn’t need touching, at least not for now. It turned out to be a large percentage, something that did surprise us.
Code does not get rusty.
Unlike physical things, code does not undergo wear and tear. It does not need maintenance to keep it running smoothly. It executes in exactly the same way every time it is run — irrespective of whether it’s the first run or the billionth one. In fact, with improvements in microprocessors and compilers, that code executes quicker today than the day it was written.
So dear reader, if you are thinking of rewriting an existing system from scratch, do have a good think. Focus on areas that are broken, and for the rest, “if it ain’t broke, don’t fix it!”