Update : french people can find a talk I did in Paris Java User Group inspired by this article here
Introduction
During the various missions I've been sent to, I've worked on a variety of legacy software projects which suffered from various kind of flaws.
Of course, poor software quality (no unit tests, clean code principles not being used...) was often a major issue, but there were also problems coming from architectural decisions taken in the early days of the project, or even at the dawn of the enterprise system. This kind of issues is, from my point of view, the greatest cause of pain for many projects.
As a matter of fact, improving code is quite easy, especially now that the software craftsmanship movement is spreading good practices across teams. But changing the core of the systems, the constraints that were imposed at the very beginning of its lifecycle, is very challenging.
I'll talk about several types of architectural decisions that I've encountered, and that can be real burdens for the teams maintaining these systems.
Sharing your database with the whole company
This is probably one of the most common issue I've seen. When several applications need to use common data, why can't we simply share the database? After all, repetition is a bad thing in software development, right? Well, not eveytime, and especially not when database is involved. Venkat Subramaniam said it in a way that can't be forgotten: "A database is like a toothbrush, you should never share it". What's so wrong about sharing a database? Many things in fact...
First thing that I can think of is obviously the coupling in the datamodel. Imagine that 2 applications A and B are dealing with cars. Application A is used in the team responsible of repairs, and so they need to store a lot of technical data about the mechanics, the failures, the history of interventions on the car... Application B is used to handle appointments for the technical team, so it only needs basic information about the car to be able to identify it. In this case, using the same datastructure for both applications makes no sense: they use very different data, so they should use their own data strucuture. This is made even easier since a car can be easily identified, so there is no need to share a common referential.
The second issue also comes from this coupling of the datamodel. Imagine that B wants to rename the identifier of the car because it makes more sense from a domain point of view. In this case, A should also be updated to handle the new column name... So, to avoid disturbing A's team, B's developers will start duplicating the information in a different column since they can't change the existing name... Of course, A will say that they will plan this changes in the future to avoid having 2 columns containing the same data, but we all know this will most probably never be achieved...
Things get even uglier when applications are not just reading data from the same source, but they also modify them! In this case, who is the owner of the data? Who should be trusted? How can the integrity of the data be guaranteed? This is already difficult when several parts of the same applications are modifying the same information, and this becomes much worse when several applications are involved...
The last case I've seen is 2 applications sharing the same data structure to store information about 2 relatively close business objects, but with just enough differences to make understanding which data belongs to which application really hard. In this case, both applications were using this table to model financial market executions, but with different levels of aggregation. Nothing indicated that there were 2 types of data in this table, so we had to look in another table (owned by the second application) to identify the lines generated by each applications... Each new developer having to work on this table would inevitably fall in the same pit as every of their predecessorq and use incorrect (sensible) data, with all the risks involved for the company.
Building your system around a business software
Not every company can develop the system to handle all its business usecases. In fact, in many cases, this would just be reinventing the wheel, since these usecases are common to many companies and so you can easily find software already supporting them on the market.
So, buying the product is often cheaper than building it. But of course, the software you just bought can't integrate with that other piece of software you also use, so you need to develop a connector between 2 (proprietary, most of the time) applications. You will probably build your own tools to handle specific part of business, and since this expensive software you've bought already has a convenient model, you'll be tempted to just use its database and add your informations to its own tables...
A few years pass, dozens of developers or teams do the same, and then you're stuck: you just can't use another software if its editor closes, or if the product is no longer supported, or if another new product suits your needs better. In some cases, you can even have technical dependencies on an external software. If the editor of the solution wants you to use each version of the language/framework/server/whatever, then you don't own the architecture of your own system. If they want to sell you a new version to provide you a feature that you absolutely need, but if this version implies a change on the technical requirements, you'll be forced to update all your technical stack to align with their recommendations. I've been there, this is not a forced migration that you want to face often...
I've worked on a project where the editor of the software we were using didn't want to develop new features for all their clients, because it became too complicated for them to handle concurrent modifications and several current versions (each client having a specific version with features only them wanted). So, they decided to sell us a Software Development Kit (SDK) so that we can implement our own features. Of course, they didn't provide much documentation about how to do it, and moreover we had to use their business entities, which we needed to decompile to understand their structures since we had neither the sources nor the documentation... The simplest feature would take days to implement, and it was barely testable since everything was very complicated and introduced scripting langages no one knew about in the team to an already complicated stack...
Tight coupling between dozens of applications
Remember the early 2000s and the joy of using Enterprise Java Beans (EJB) to handle remote calls between applications in your information system. At this time, this may have looked like a good idea. Sharing your codebase with other teams to avoid duplication seems ok too. Yes, every teams were forced to deliver their applications at the same time to make sure there was no broken binary dependencies, but these were fun evenings, eating pizzas with colleagues while waiting for the 2 hours delivery process to be completed, isn't it?
Well, in fact it wasn't that fun. And being unable to refactor a single class in your own codebase because someone in the company liked your code and decided to use it in their untested application isn't a pleasure neither.
Once you realize the mess that these early decisions caused, the effort required to decouple your application from the rest of the world is overwhelming. It litteraly take years to cut down your project into different components so that other applications won't be able to use your core domain, your client or your cache mechanism anymore, to remove every use of external classes that are tight coupling to other projects, to replace all EJB calls with REST APIs... But the reward for everyone involved in the project is huge: easier development and testing, faster delivery process since there is no need to synchronize with everyone else anymore, better separation of concerns in your own code, easier dependency management, no more issues of transitive dependencies because your are importing a ton of other applications' dependencies in your classpath... These expensive changes are really a life saver for the team, and they would have been much cheaper to implement at the dawn of the project!
Building your project over someone else's project
This problem may be the one you're most unlikely to face, but this can still happen and this is the worst case scenario, since it cumulates several of the previous issues. In fact, I've faced this issue in one of the first project I've worked on in my career.
When I arrived on the project, I was told this was a total rewrite of the company system and that the project had just started 2 months ago. So, when I saw a complex webapplication with a full adminstration module, a complex business feature already implemented and a mature framework to help developing other modules, I was surprised. I quickly learned that all this stuff has mostly not been developed by the team: it was decided to reuse the framework developed by another company inside the group to avoid starting from scratch. The problem is that this framework had not been isolated from the project it was developed for. So, our team just got an archive containing all the source code of the other company's project, including their business code, which had nothing in common with our own business. Even worse, we've also inherited from their database schema and data...
As a newcomer in the team, it was difficult to know what code was related to the framework, to our project and to the other company's business. The team wanted to clean this mess, but many attempts ended with severe regressions because of dependencies between parts of the code (I can't talk about modules since there was only one!), and of course there was no automated tests at all. Moreover, we had to abandon the idea of using a different application server because there was code specific to the one used by the other company everywhere in the system, making this migration too expensive for our small team.
At some point, we wanted to add some nice features to the framework, but we were told this had already been done in the other company. So, we were asked to merge our current version with the current version of the other company... The team managed to avoid this nightmare by just cherry picking a part of the new feature, but it was still way more complex and rich than what we needed...
We managed to finish this project, but the quality of our project was a real pain. At least 40% of the code and the database contents was useless, and it never became a priority to clean this dead code. I hope the team has finally the occasion to isolate their own code since I left the team !
All your business logic is in a rule management engine
Putting a bit of your business logic in a rule management system is a common practice. This is for instance useful when some of your business rules need to be updated frequently but your monolithic application's delivery process requires long testing phase before being able to validate a release candidate, making it impossible to adjust some of your "volatile" rules. Eventhough I prefer that all the domain rules to be located in the code, I can understand that sometimes a rule management system can help.
But I've faced a case where almost ALL the business logic was located in a rule management system, with sometimes rules being generated from an Excel file! Moreover, rules were not supposed to change very often, since the project was basically an ETL batch. The Java project behind all this was just made of technical details about the batch framework and raw read/write from source and target systems, with absolutely no reference to the domain.
As a consequence, all the rules were written in a specific language that nobody really mastered in the team, was hard to write (our IDEs didn't handle it) and almost impossible to debug or test. When a new rule or a change to an existing one was requested, most developers in the teams just copied/pasted an existing rule, leading to whole identical files except one specific change (often, it was the field on which the rule applied).
If this already seems troubling, there was absolutely no clue in each rule about its purpose. Rules were named Rule1, Rule2 with more than 100 of them! And each rule was basically checks and assignment on hard coded values without any business term. Even the name of the project didn't explain the purpose of the whole ETL.
Conclusion
As Uncle Bob explains in his book "Clean Architecture", when thinking the architecture of a project, some decisions must be postponed until we really need to make a choice unless we can't continue to add value to our product (like choosing a database for instance). Other decisions must be taken really early, do not wait until it gets ugly. Fortunately, this kind of critical decision can easily be spotted, because they are what can be called architectural smells: when you think about it, they can only be bad ideas that will come back and haunt you at one point or another. Unfortunately, when working on legacy software, this kind of burden is often burried deep in the code, making them very expensive to eliminate.
We shouldn't be afraid. Yes, cleaning years or even decades of mess is not an easy task, but as software professional, we just can't let it continue to rot and kill the developers' motivation and the trust our users put into our product and our capacity to deliver business value to them.
Of course, each of the architectural burdens I described can be solved in many ways, so there is no silver bullet to resolve each issue. But I'm sure that every team can come up with propositions to finally be free of their burden. So, let's face our issues together and start cleaning this mess!
Top comments (9)
Ouch - yep! On a positive note, there are at least well trodden paths now to get out of these messes, removing coupling, improving cohesion and giving teams some autonomy back:
thoughtworks.com/insights/blog/mic...
martinfowler.com/articles/break-mo...
databaserefactoring.com/
highfive.com/blog/how-spotify-buil...
Thank you for this great article. Been there done that.
My philosophy - which is frowned up along my colleagues - is simple and counterintuitive:
Build disposable software. That's it.
Abstract as much is needed to allow throwing complete parts of your software out of the window.
This is not, what we are taught - mostly. We are taught building cathedrals of architectural beauty and purity which could potentially last forever.
But from my experience it doesn't pay - from a business perspective - to invest much in lasting architecture, because business changes often enough. And here disposability is an advantage.
That doesn't mean that you do not have to care at all: on the contrary!
It is more like TDD on scale: do the minimum architecture to solve the current business problem. And if done correcly - like in TDD - you could throw the current implementation out of the window and do something new.
If your architecture became a burden, you made it too heavyweight.
I think I have worked on all these architectural burdens, and created a few myself. Come to think of it I developed my own rule management engine (also for an ETL process) where the rules were represented as SQL statements. At least I used an ISO standard language. :)
Probably my most painful designs were the ones where I built a framework for the app. You had to plug your object into it to have plumbing automatically handled for you. That would be fine, except the next feature request was usually an uncommon case. Then I ended up having to refactor the framework to handle it. Then the objects you plug in end up depending too much on the way the framework operates, so they have to be refactored too. This experience is why I now prefer the framework-less approach to the apps I build, and functional programming in general. It can be a little more work up front, but way easier to maintain.
I would really like to see a well thought out path to solution a large shared database. My team struggles with this and we know itβs a problem.
Oh, microservices? So we build out microservices to cover all the needs for accessing this data. The vast majority is read-only, since its ERP data, so it should be easy to do that.
But wait, a year later we need to modify one of those microservices to add a field to the output of some endpoint. Who owns that microservice? Weβre in the same problem you described with the database, with the bonus problem of much higher latency.
How do we avoid the same problem now? I would love to consider these microservices as disposable, but itβs almost impossible to think of things that way in enterprise development. There are 10 applications using that microsevice now, and it definitely seems like a bad idea to add cruft for my edge case to it.
Great post and I recognize a lot of this pain. This is definitely one of those things that you run into throughout your career that can make you feel deflated. However, these are decisions that can be undone. The process is just longer and more tedious than others.
Yikes, I definitely recognize some of this. Great post.
Thank you ! Unfortunately, I think most developers faced or will face at least one if these issues in their careers.
Great post,
and I'd love to read more about how you solved these issues.
Easy if you could just restart from scratch, but I really don't think you could every time... :)
Riccardo
Unfortunately, I didn't have the occasion to solve all these problems. The team and the management was often conscious of the situation, but it was difficult to spend enough time on technical subjects compared to the business requiring new features. But in some cases, we were able to achieve a lot regarding decoupling with other applications: removing all dependencies to other teams classes, using REST APIs (using Spring Remote HTTP mechanism to avoid a big bang change), replacing communication via database with REST APIs or messaging... We also started to use our own database schema, while still feeding the lecacy one, so that we could progressively migrate to our own model, but this will be quite a long road!