We devs often have to jump from repo to repo as we work through implementing a new feature or making a change to an app or API (hello SREs). Approaching a complex codebase that we haven't touched before (or recently) can be a daunting task. Having a systematic approach for getting acquainted with a codebase before rushing to introduce change, will give you a more encompassing view of the code, help you put the required change in context, and save you from shaving the wrong yak.
Whether solo or pair programming, small or large codebase, open source or proprietary code, follow these steps before you start hacking away.
1. Start from the README
A README is the de-facto index page of a program or codebase for users and future maintainers. Good READMEs welcome developers to self-service code changes in open organizations. Codebase owners should ensure the most important things a maintainer needs to know about the app are documented here, along with a quick try-it-yourself guide and one-liners to build, test or setup the app. At a minimum, the README should serve as an index that points to more detailed documents and diagrams.
Questions to ponder: What does this codebase do? Does it have tests? Can I install it? Does it have diagrams?
2. Poke at the CI pipeline
Looking at a codebase from the perspective of the CI pipeline gives you insights into the change frequency, stability and overall health of the codebase. Confirming the codebase is in a healthy state and "ready for change" before making a code change can save you from going down rabbit-holes, troubleshooting errors unrelated to your change.
When exploring a CI pipeline, look for common failures and signs of flaky tests so you know what to expect when running the tests locally; Browsing through recent build (and commits) can reveal patterns about a common type of change, the average size of change, or major refactorings or features that have just been introduced; From the list of releases, you can tell the "release cadence" and when to expect your change to make it to production. Lastly, rerun the most recent job to confirm the pipeline is idempotent and build artifacts outputted are consistent on every run.
If the pipeline is red, adding new revisions would only increase noise and make it harder for others to troubleshoot. Hold off pushing your changes until the codebase is back to continuous integration mode.
Questions to ponder: Is the pipeline green? When was the last time it ran? Does it fail often? Does it perform linting? Does it have flaky tests? Does it have e2e tests? Who was a recent contributor I could reach out to for help?
3. Run the tests from your local
Running the test suite from your machine gives you a baseline for when you start hacking away on code and iterating on the new test case. Running the tests can yield some insights on the level of test coverage, testing patterns used by maintainers, potential external dependencies, and the overall maintainability of the codebase. Codebases with consistent test patterns and sensible test coverage make it safe and efficient to introduce change.
Questions to ponder: Are the tests passing? Does it even have tests? Can the tests run on my machine? Do I have the required dependencies? Does it have external dependencies? Can the tests run with my Wi-Fi off? Can I add a new test?
4. Identify the entry-point
The entry-point in a software program determines how it is initiated and executed. Knowing where the entry-point is, gives you an idea of how to consume and test the code you're about to change. Most apps perform some form of configuration task upon start. Any configuration required to run the app it's likely being read and validated near the entry-point. When introducing new configuration options to an app like a new environment variable, the entry-point is a good place to start.
What the entry-point looks like depends on what the program does and how it's consumed. In HTTP based programs like Web apps or JSON APIs, the entry-point is an "http server" that opens a port and accepts TCP connections. You'd then need a client to consume it. Search the codebase for the occurrence of that port or references to HTTP resource paths. In the case of libraries, the entry-point would be a set of public interfaces or methods that expose a certain functionality. Tests are a good place to start when digging into library code. For command line apps (or CLIs) the entry-point would be a "command" function that's meant to be invoked from a terminal once installed.
In Dockerized apps, start by looking at the Dockerfile
or docker-compose.yaml
files. If an entry-point is not explicitly configured, one will be required when launching the container on the target platform. If running in Kubernetes, the entry-point would then be found in the command
field of the Pod
specification.
Questions to ponder: How does it run? How is it initialized? Does it need configuration? How is it executed in production?
5. Read up. Spot the patterns.
At this point, we should have a better view of the state and form of the codebase. Next step is to look at its structure and the actual business code. Every codebase has patterns set forward by the early or main maintainers. Depending on the size and type of change, you may need to emulate or adapt those patterns. Spotting the patterns, practices and overall structure of the code, puts the required code change in context and keeps you focused on introducing only the necessary code modifications. For consistency's sake, practices and conventions already established across the codebase should prevail over individual preferences.
If the domain model is not too anemic, scan for types (or classes) and their publicly scoped methods.
Questions to ponder: Where does my code change fit in all this? Can I change the code with the least impact to existing features? Can the codebase support the required change or will it require a refactoring?
At this point I'd strongly encourage you to practice TDD and write a unit test first before actually jumping into changing the code, but that's subject for another post :)
And you? How do you approach a codebase?
Top comments (0)