DEV Community

Cover image for Solving the Tech Debt Puzzle: Strategies that boost business
Michał Szopa for Brainhub

Posted on • Updated on • Originally published at brainhub.eu

Solving the Tech Debt Puzzle: Strategies that boost business

In the world of software development, it's almost inevitable that you will encounter projects struggling with technical debt and poorly written code. Such projects often provide quite a challenge, requiring a strategic approach to salvage and evolve them into robust, maintainable systems.

The article delves into the challenge of dealing with legacy code and technical debt and describes lessons learned from a real-world project we worked on in Brainhub. As software engineers, we understand that working with legacy systems is an integral part of our craft, and it's a journey that can be both very challenging and rewarding.

In the article, I share the story of a project plagued by technical debt, complex APIs, performance bottlenecks, and scalability concerns. We'll explore some strategies, tactics, and practical steps we took to breathe new life into the project, all while continuing to deliver business value.

Encountered challenges

  1. High coupling: The backend, a critical component of our application, was mainly implemented within just two massive source files, each containing thousands of lines of code. On the front end, our Angular application suffered from enormous components loaded with complex logic, making it a formidable challenge to maintain.
  2. Test Coverage: The project was full of issues in the realm of testing. The backend had only a single unit test, and the frontend didn't fare much better, with several unit tests that offered little value in ensuring the quality of the code.
  3. Lack of proper local environment setup: The absence of a well-defined local development environment isolated from the other system components was a huge problem, making it nearly impossible to work independently from the rest of the system.
  4. No observability and poor CI/CD: The project lacked essential practices for ensuring code quality and deployment efficiency. There was no observability to gain insights into system behaviour and no CI/CD pipeline testing of the system.
  5. Lack of project documentation - The project suffered from poor quality documentation. This absence of comprehensive documentation made system discovery challenging, forcing the team to resort to manual testing and code reverse engineering to understand how various system components and functionalities operated.

The real challenge: Where to start?

While our project was undeniably full of technical issues, the most challenging task we faced was deciding where to begin the journey of making things better.
Addressing the issues mentioned above was a vital step, but attempting to tackle them all at once would have been impossible. So, where did we start, and how did we navigate this labyrinth of challenges?

Our first step was to conduct a comprehensive tech audit of the project. This audit wasn't just about identifying the technical problems; it was about making the client aware of the pressing issues of tech debt and conveying its broader implications.

The tech audit allowed us to highlight the problems and risks lurking within our codebase. We prioritised these issues based on their importance and potential impact on the project's success. More importantly, we used this audit as a platform to engage the client in a conversation about the significance of addressing tech debt - we conveyed that it wasn't just a matter of improving code quality but an essential investment in the project's future. We couldn't afford too much technical debt.
Diagram showing the cost difference between low and high tech debt projects

However, it's worth noting that while the tech audit was a great tool to raise awareness, it wasn't a silver bullet - the problems identified during the audit were too broad to be transformed directly into actionable tasks or epics and were often too separated from the business priorities, making them hard to plan.

The essential takeaway from this phase was clear: while the tech audit served as a crucial catalyst for change, it didn't provide an immediate roadmap for resolution. To make significant progress, we needed to connect these technical challenges with actual business value. Only by aligning our efforts with the project's primary business goals could we address tech debt effectively within the constraints of tight project deadlines.

Establishing a Testing Strategy

We all agree that any healthy project should have adequate tests that it can rely on. In our mission to improve the technical quality of the project, we also had to choose the right strategy and approach to testing so that we could focus on further development of the system with peace of mind.

Traditionally, the testing pyramid preaches the importance of a robust foundation of unit tests, followed by integration and end-to-end tests. However, in our specific case, adhering strictly to the testing pyramid's principles presented a unique challenge - our codebase had reached a state where writing comprehensive unit tests could be compared to applying a fresh coat of paint to a crumbling wall. It would seal the problems rather than fix them, and maintaining such tests would become a nightmare.

Instead, we adopted an alternative approach, which is called the Testing Trophy, and applied its main slogan stating:

Write tests. Not too many. Mostly integration

Testing trophy

Our approach was simple: focus on covering the main end-users flow. By creating proper high-level tests using Playwright (which made the task quite enjoyable for us), primarily focusing on the E2Es and integration tests, we ensured that every change we made stayed within the primary purpose of our application. These tests became the sentinels guarding against regressions and unexpected behaviour and gave us a very high value in a relatively short period. Of course, they were not the fastest and far from perfect, but we felt much more confident in what we were doing then.

Refactoring: Purpose Over Perfection

Once we had a certain level of security and confidence assured through testing, we could start thinking about making concrete changes.

When confronted with a codebase ridden with tech debt and suboptimal code quality, it's almost natural to be tempted by the siren call of a fresh start. The idea of scrapping everything and rebuilding from scratch can be enticing, but it's a path fraught with risks and challenges. Refactoring always comes with a risk and should always be driven by a clear purpose and a close alignment with business goals.

Usually, a rewrite is only practical when it's the only option left.

The reason rewrites are so risky in practice is that replacing one working system with another is rarely an overnight change. We rarely understand what the previous system did - many of its properties are accidental in nature and tests alone will not be able to guarantee that some regressions have not been introduced after all.

Baby steps to the rescue!

Although It may sound like a much less attractive and exciting path to choose from the engineering point of view, being able to flawlessly improve the existing system may actually come with much more complex and interesting challenges.

Where to start?

We learned this key lesson: refactoring should have a well-defined purpose and always support business. While code aesthetics are essential, they should never be the sole driving force behind refactoring efforts. Instead, refactoring should be a strategic action to solve specific problems or achieve long-term benefits. Taking that into account, we have identified the following points that might help you to identify areas of improvement in your project that you might want to consider:

  1. Pay attention to the roadmap - Identifying opportunities for refactoring can be closely guided by the project roadmap, strategically aligning code improvements with the evolving needs and priorities of the business.
  2. Analyse the bugs - maybe they are related and come from the same tangled place in the codebase. If that's the case, that might be a reason to consider a more significant change, replacing that part of the system.
  3. Define the most essential part of the application - that ensures that refactoring efforts are directed towards areas with the most significant impact on achieving business objectives.
  4. Communication is key - open and transparent communication within the team is crucial. Identifying and sharing technical challenges early on and incorporating them into the planning are very important.

Case study: Let's try to put it into practice

As mentioned, we were struggling with quite a few problems. To navigate this labyrinth of issues effectively, we adopted a structured approach that leveraged testing, refactoring, and careful planning.

Using our backend as an example, we started gradually transitioning to the Nest - a robust and modular framework for building scalable applications. This transition allowed us to modernise our codebase incrementally without disrupting existing functionality by setting it up next to an existing Express application, which turned out to be as easy as adding only a few lines of code:

import { app as ExpressApp } from './legacy-app';
import { AppModule } from './app.module';

const bootstrap = async() => {
  const port = process.env.PORT || 3000;

  const nest = await NestFactory.create(
    AppModule,
    new ExpressAdapter(ExpressApp),
  );

  await nest.init();
  http.createServer(ExpressApp).listen(port);
}
Enter fullscreen mode Exit fullscreen mode

While the framework was only meant to be a tool to support our changes, our primary goal and plan for further migration considered the following main points:

Isolation of technical debt: By encapsulating legacy code within the existing system and developing new features in the Nest framework, we effectively isolated the legacy code, preventing it from contaminating the new codebase.

Introduction of v2 API: We have introduced a new version of our API, featuring a cleaner, more intuitive structure, with implementation following the specifications covering it with proper tests. It allowed us to add new features and enhancements without altering the existing API, which served as the backbone of our application, while slowly moving our API clients to use the latest version.

Documenting business requirements: To gain clarity and alignment among the development team and stakeholders, we invested time in documenting business requirements comprehensively by creating a detailed specification that served as a blueprint for development and further testing.

Incremental improvements through minor rewrites: One of the fundamental principles of our approach was to avoid monumental rewrites. Instead, we opted for small, manageable baby steps that could be seamlessly integrated into our ongoing business tasks. This approach ensured that we only bit off what we could chew and allowed us to improve core functionalities while delivering new features continuously.

Example: User registration process update

I have already said a lot about the theory, so let's now take a closer look at a specific example, putting it all into practice.

At some point, we were asked to add account verification email functionality as part of the user registration process.
The current registration logic has become a complex piece of code over time as it manages user creation, database interactions, authentication, and more - all within hundreds of lines of code, so it turned out to be quite a challenge.

To address this problem, we could implement the new feature in a new Nest way, but the hard part would still be to integrate it with the legacy endpoint. The easiest way to move forward would be to rewrite the mentioned endpoint as well, solving all the mentioned problems but at the same time introducing a vast risk and putting in a great deal of work before we can even start working on the primary goal of the task.

In order to avoid all that, we decided to introduce an event-driven approach called Broker Topology.

In broker topology, you send events to a central broker (event bus), and all broker subscribers receive and process events asynchronously.

Diagram showing broker topology where events pushed to a central broker are processed asynchronously by subscribers

In this approach, whenever the event bus receives an event from event creators, it passes the event to all subscribers, who can then process the data with their own logic. In this case, the only dependency of both publishers and subscribers is the Event Bus, which significantly reduces the coupling.

Below is an illustrative example of implementing the event-driven approach to handle the described. By doing so, all the change that needed to be done on the legacy side was to notify the event bus about the new user registration event; that's all:

await EventBus.getInstance().dispatch(
  new NewUserRegisteredEvent(payload),
);
Enter fullscreen mode Exit fullscreen mode

Which then could be handled by all interested subscribers, who are responsible for handling it in their specific way:


export class AccountVerificationService {
  constructor(
    @Inject(EVENT_BUS_PROVIDER) private readonly eventBus: EventBus,
    // ...
  ) {
    this.eventBus.registerHandler(
      ApplicationEvents.NewUserRegisteredEvent,
      (eventPayload: NewUserRegisteredEventPayload) => 
           this.processUserAccountVerification(eventPayload);
    );
  }

  // ...
}
Enter fullscreen mode Exit fullscreen mode

By leveraging the event bus and embracing modularity, we achieved our goal of adding new features in a tested and maintainable manner while minimising disruption to the existing system.

Summary

Dealing with technical debt can be challenging and a long journey that demands patience and strategic thinking. After approximately ten months of dedicated effort, we transformed the described codebase from 0% to about 50% code coverage and migrated most of the backend code to the isolated Nest modules. Moreover, we successfully implemented integration and end-to-end tests, ensuring robustness and stability, checking our system thoroughly after every change, and still staying flexible enough to add new features and meet changing business requirements.
There is still much work to be done, but the project is in a different place, ready for further development.

Top comments (0)