Fran Iglesias

Posted on Jun 17, 2018 • Edited on Nov 22, 2018

Refactor is about knowledge and meaning

#php #goodpractices #refactor #legacy

Refactoring is more than a technical task. Refactor has business value because its main purpose is to ensure that business language and knowledge are reflected into the code.

Legacy is code without test

This quote from Michael Feathers may seem an exaggeration but it is a good definition of what it means to confront legacy code. The non existence of tests reflect our ignorance about how this code works and that ignorance is in the basement of our problems with legacy.

From time to time, we could say or listen that we don't have a clear idea of how a piece code in production is working. Even we wonder how it is possible that it could work, to start with. Although it works normally, strange and difficult to debug errors are common. Tests, if any, should provide us with the information that code is not able to communicate or, at least, they should allow us to proceed with a refactor that helps to organize things better.

But, given that not always we can count on them: is it possible to refactor without tests?

Refactor without tests

In fact, it is possible to refactor without tests, at least within certain limits and being a preliminary step to cover that code with them.

Mainly, we have the automatic refactor tools included in the IDE, which, in general, provide us with a certain number of safe refactors in the sense that they will not alter the behavior of the code. Even not having tests, we can trust those changes.

That trust is reinforced when taking into account the scope of the refactor. For example, if you change the name of a private method in a class to be more descriptive, you can be sure that this change isn't going to affect other parts of the code. The same happens if you extract methods to modularize other larger methods.

But, why to refactor code that works? What value does it provide?

We refactor mainly to reflect the knowledge we've got about the code.

When we study a piece of legacy code, normally with the goal to fix problems or add functionality, we are developing our knowledge of how it works. Sometimes, we can interpret what code does simply by reading it. Other times, we establish hypotheses about how we think it works and we try to corroborate that with the results that it generates.

At that moment we can refactor to accomplish two main goals:

Reflect our knowledge of how the code works, rewriting it in a more intelligible way.
Put the system in a testable state.

Refactor as a reflection of knowledge about the code

As we read code, we get an idea of how it works and what each of the symbols that we find in it represents. The question is: where is that knowledge stored? Only in our head?

Refactor is the way we have to put our knowledge about the code in the code itself.

Let's see a simple example. Suppose the following fragment:

$this->userRepository->save($user);
$message = new Message();
$message->setTo($user->email());
$this->mailer->send($message);

Taking the context into account, and doing some trial and error, we surely conclude that $message refers to a confirmation message reporting that the user has just been created in the system.

So, we could easily change the name of the variable to reflect that we've just learned:

$this->userRepository->save($user);
$confirmationMessage = new Message();
$confirmationMessage->setTo($user->email());
$this->mailer->send($confirmationMessage);

Or even in a more specific way:

$this->userRepository->save($user);
$userWasCreatedMessage = new Message();
$userWasCreatedMessage->setTo($user->email());
$this->mailer->send($userWasCreatedMessage);

This refactor (rename) doesn't change the functionality, but it makes that code describe what's happening with greater precision. If in the original code an effort was needed to interpret what code was doing, in the refactored version that effort is unnecessary, because code is pretty explicit about it.

In addition, having collected that knowledge in the code itself, in the future it will be easier to intervene on it, which will save time and effort.

Applied refactors

Rename

Naming things has pretty bad press in the programming world.

Change names allows us to reflect our knowledge of the system in the code itself, eliminating ambiguity, too generic names, inadequate or equivocal.

In the last section we've shown an example of refactor by renaming. Here you are another one in which we change a too generic name with another one that describes with detail what we do with the file.

public function processFile($file);

// vs

public function extractProductInformationFromFile($file);

Decades ago, when the costs of memory and storage were too much high, we have to economize with cryptic variable names with only one char length. Since long time we can, and we must, allow us expressive names for whatever programming element.

Let's imagine a for loop:

for ($i = 0; $i <= $max, $i++) {
    fwrite($file, $lines[$i]);
}

What does this code tell? Apparently it's talking about an array of lines that are written, one by one, in a file till reach a limit. Why not name it with all of its letters?

for ($line = 0; $line <= $maxLines, $line++) {
    fwrite($file, $lines[$line]);
}

Again, more explicit names allow us an easier and unique reading of the code.

A third example. Let's suposse a simple algorithm to match rivals in a sports competition:

foreach ($teams as $t1) {
    foreach ($teams as $t2) {
        if ($t1 === $t2) {
            continue;
        }
        $matches[] = [$t1, $t2];
    }
}

That was no so much ugly but it's way easier to read the following:

foreach ($teams as $local) {
    foreach ($teams as $visitor) {
        if ($local === $visitor) {
            continue;
        }
        $matches[] = [$local, $visitor];
    }
}

To sum up, change our names benefits us because:

It reflects our knowledge of the code
It gives meaning to the renamed elements

Extract constants

Every time we find a number or a text into the code arises the problem of understand what it means. This problem is a code smell called magic number.

Magic numbers (or magic values, being more generic) are arbitrary values that are introduced in code to represent something that use to be invariant.

The problem is that their meaning is far from being obvious and the choice of an specific value can be a complete mystery when we are examining code written by another person log time ago. In general, we could ask two questions:

What does this value mean?
Why was it selected?

Here we go with an example:

$this->queryBuilder()->setMaxResults(25);

25 here is a magic number that represents the maximum quantity of registers that the query must return. Being more specific, it could refer to the number of elements per API page or view.

$this->queryBuilder()->setMaxResults(MAX_ITEMS_PER_VIEW);

On the other hand, once we find the answer to this question another problem could arise: are these magic values coherently used through all the code? That's to say: if these values are used in other places of the code with the same meaning then we have a problem with multiples sources of truth that could diverge.

When we define a constant to refer to that magic value, we can use it in a coherent way in multiple places.

echo sprintf('showing %s items', MAX_ITEMS_PER_VIEW);

To sum it up: extracting constant from magic values gives meaning to those values.

Extract variables

Main motivation to extract variables is to simplify expressions, dividing them in meaningful and manageable parts.

Let's see a simple example, in which we apply some name changes:

$amount = $price - ($price * $pct / 100);

// vs

$discount = $price * $discountPercent / 100;

$amount = $price - $discount;

What we have done was to extract the parenthesis to a variable. This has several benefits:

It allows to make more meaningful the main expression.
It helps us to identify and to avoid possible problems with operators precedence.

Obviously it is a very simple example, but the more complex is the expression, the greater is the benefit of this kind of refactor.

Another example:

$object->setNewValue($oldValue * 1.3 + $margin);

//vs

$newValue = $oldValue * COEFFICIENT + $margin;
$object->setNewValue($newValue);

Extracting an expression to a variable makes it much clearer and the method usage is cleaner.

Extract methods

Isolate hidden dependencies

When we are reading legacy and find hidden dependencies extract them to a method is a good first step to get it out of the class and inject it later.

// ...
$date = new DateTime();
// ...

// vs

// ...
$date = $this->getCurrentDate();
// ...

public function getCurrentDate()
{
    return new DateTime();
}

When we extract a dependency to a method we isolate it and we reduce its instantiations to one. This is no the best of the techniques, but using this strategy we could create a testable class by means of inheritance, overriding the getCurrentDate method. But it is better to advance a little more.

Now, we could easily replace it with an injected dependency.


class MyClass
{
    private $clockService;

    public function __construct(ClockService $clockService)
    {
        $this->clockService = $clockService;
    }

    public function getCurrentDate()
    {
        return $this->clockService->currentDate();
    }
}

Simplify complex operations based on abstraction levels

Very long methods (more that 20-25 lines) could be splitted in tightly related blocks of code and, then, be separated in smaller methods with semantic unity.

Frequently, in the body of a method there are mixed concerns in different abstraction levels. This mix makes the code difficult to read because we have to constantly adapt to those level changes.

For example, in a line we get something from a repository and just after we have ten lines of operations to generate some value we need.

So, a good strategy is to group operations from the lowest level in methods with a meaningful name that explains what it does on a higher level. And, if needed, do the same with the extracted methods themselves.

This way, the body of the main method can be read more easily, given us a general idea of what's happening and letting us to go down into the details when we need.

public function doSomething()
{
   // more than 30 lines of code…
}

//vs

public function doSomething()
{
   $this->prepareThings();
   $this->getData();
   $this->processData();
   return $result;
}

It seems as the main body was the index of a book from where the different chapters are called.

Let's see the following example, in which we ensure that a value is valid:

if (! in_array($value, self::VALID_TYPES)) {
    //...
    throw new InvalidArgumentException($errorMessage);
}
//...

We can extract this section in its own method:


$this->failIfValueIsAnInvalidType($value);

//...

public function failIfValueIsAnInvalidType(string $value):void
{
    if (! in_array($value, self::VALID_TYPES)) {
        //...
        throw new InvalidArgumentException($errorMessage);
    }
}

Encapsulate complex operations and give them meaning

A particularly useful case of extract method is to apply it to conditional expressions, not only the complex ones, which meaning is not evident.

From the previous examples, we could make the conditional expression to be more explicit:

if (! in_array($value, self::VALID_TYPES)) {
    //...
    throw new InvalidArgumentException($errorMessage);
}
//...

This way:

if (! $this->isValidType($value)) {
    //...
    throw new InvalidArgumentException($errorMessage);
}
//...

public function isValidType(string $value): boolean
{
    return in_array($value, self::VALID_TYPES);
}

Extract interface

We extract interface to abstract behaviors, so we could replace or change the implementations that currently perform them, giving meaning to them.

Extracting interfaces allows us to decouple ourselves from specific implementations of dependencies, either because we want to be able to change them for others, or because we want to improve the ability to test our system, etc.

A very simplistic example: we have a MyClass class that uses SomeService as dependency.

class MyClass
{
    private $service;

    public function __construct(SomeService $service)
    {
        $this->service = $service;
    }

    public function doThis(): Class
    {
        return $this->service->doSomething();
    }
}

class SomeService
{
   public function doSomething(): Class
   {
   //...
   }
}

We extract an interface based on the way MyClass uses SomeService:


interface Service
{
   public function doSomething(): Class;
}

SomeService could have many other methods, but we are only interested in doSomething.

Now, we change SomeService to implement the just defined interface and we make that MyClass expects a dependency from the interface SomeService. (We should better create an adapter that uses SomeService, instead, but I'd rather prefer simplify te example here).

class SomeService implements Service
{
//...
}

class MyClass
{
    private $service;

    public function __construct(Service $service)
    {
        $this->service = $service;
    }
    //...
}

Now, MyClass and SomeService are decoupled and both depend uniquely on the interface Service.

If you want, you can create new implementations of the same interface:

class BetterService implements Service

Beyond well written code

In all examples, the benefit of refactoring is to better reflect our knowledge about the system and make the code explain by itself.

Refactoring goes beyond offering us a better written code, what comes as consequence of the process. Refactor is about knowledge and meaning.

Refactor makes us to put our knowledge about the system in the code itself, making sure that concepts and operations are well described and that the domain language is present at all the code levels.

By this reason, refactor is a task that also benefits the business on a very direct way: the better the business language is reflected in the code, the more knowledge we are able to find in the code, so improvements, fixings and new features will be more efficient, fast and secure.

DEV Community