DEV Community

Elias Nogueira
Elias Nogueira

Posted on

Reasons to avoid RandomStringUtils for test data generation

Which problem do we want to solve?

We will try to solve one of the worst practices used on tests at any level: fixed/hard-coded data.

I want to avoid as much as possible any manual pre-action before I can run my tests and, because of that, I try to avoid as well the usage of static files (CSV, TXT, XLS, JSON).

Here we will see a common usage from Java developers: the RamdomStringUItils and how it might not be the best choice for automatic data generation.

By the way, I recommend the automatic data generation in the tests using the Test Data Factory approach, and you can find an example here in my blog: Test Data Factory: Why and How to Use.

The examples described here are simple, without the usage of the Test Data Factory, and will show you why the RandomStringUtils might not be the best approach.

Example

We will automatically generate data to a Customer object with has the following criteria:

Attribute Type Constraints
id int Not null
name String Not null and size between 2 and 50 characters
profession String Not null and size between 2 and 30 characters
accountNumber String Not null and size as 18 characters
address String Not null and size between 2 and 50 characters
phoneNumber String Not null and size between 11 and 14 characters
birthday Date Not null

To reduce the number of tests, the key point is to generate valid data given the constraints. In a professional environment, we would implement the tests for the edge cases as well.

Think about this Customer data as an object used in any test level (unit, integration, service, UI).

What does the RandomStringUtils class do?

RandomStringUtils is a class from the Apache Commons Lang library that generates random Strings based on different conditions like:

  • length
  • letters
  • numbers
  • alphanumeric
  • ASCII
  • numeric
  • print

It’s a static class where you can directly generate any String, so it’s super handy!

See the example below, where you can generate a different set of random data.



public class RandomStringUtilsExample {

    public static void main(String[] args) {
        // returns a String with 5 numbers
        // example 82114
        RandomStringUtils.randomNumeric(5);

        // returns an alphanumeric String with length as 30 mixing upper and lower cases
        // example gQ6RB8MiwKOg9O3qnHFo7I3jilHoIy
        RandomStringUtils.randomAlphanumeric(30);
    }
}


Enter fullscreen mode Exit fullscreen mode

What is the result of using RandomStringUtils class?

Let’s first take a look at the code example implementing the usage of RamdonStringUtils:

  • in line 7 is using the RandomStringUtils.randomNumeric() method to generate an int value and, to make it, possible we are parsing the String into Int using Integer.valueOf()
  • the lines 8 to 12 uses RandomStringUtils.randomAlphanumeric() to generate alphanumeric data
  • line 13 has a fixed date as now (today) because RandomStringUtils generates only Strings


class BasicExampleTest {

    @Test
    @DisplayName("Data validations using RandomStringUtils")
    void randomStringUtils() {
        CustomerData customerData = CustomerData.builder().
                id(Integer.valueOf(RandomStringUtils.randomNumeric(10))).
                name(RandomStringUtils.randomNumeric(50)).
                profession(RandomStringUtils.randomAlphanumeric(30)).
                accountNumber(RandomStringUtils.randomAlphanumeric(18)).
                address(RandomStringUtils.randomAlphanumeric(50)).
                phoneNumber(RandomStringUtils.randomAlphanumeric(14)).
                birthday(new Date()).
                build();
    }
}


Enter fullscreen mode Exit fullscreen mode

The output of the test execution, if we print or inspect the customerData object, is:



{
  "id": 1335130963,
  "name": "GGXS19kN6kSuzHwW6T0YjJCxUaIyKKmAaUdQH51gdUAtt1TwqY",
  "profession": "0kk8HSiFgCUVfLzbD3PyR6cn8j0LH3",
  "accountNumber": "PqvekXb9ekRAJi3ypy",
  "address": "90lqP2LHnQMWtmMP8vasO3BR5dsICIL85u5sJ0yjGKWXxCkFsj",
  "phoneNumber": "OpoJ3tOE53woy9",
  "birthday": "Sep 26, 2021, 10:01:10 PM"
}


Enter fullscreen mode Exit fullscreen mode

We could successfully generate the necessary data! Yay!

What does JavaFaker do?

[]JavaFaker](https://github.com/DiUS/java-faker) is an open-source library based on Faker to generate fake data.

There’ are some nonsense data generation there like Avatar, Friends, etc.., but there’s also a good set of objects that can generate data in certain conditions that matches our necessities.

I invite you to take a look at the GitHub repo and see the different objects to generate data.

What is the result of using JavaFaker?

The code implementation to generate data using the CustomerData class is:

  • in line 9, the number() method is in use to generate a random number
  • in line 10, the name() method is in use to generate a full name
  • in line 11, the company() is in use to generate a profession
  • in line 12, the finance() method is in use to generate a valid IBAN for the Netherlands country
  • in line 13, the address() method is in use to generate a full street address
  • in line 14, the phoneNumber() method is in use to generate a cell phone number
  • in line 15, the date() method is in use to generate birthday data with the age between 18 and 90


class BasicExampleTest {

    @Test
    @DisplayName("Data validations using faker library")
    void faker() {
        Faker faker = new Faker();

        CustomerData customerData = CustomerData.builder().
                id((int) faker.number().randomNumber()).
                name(faker.name().name()).
                profession(faker.company().profession()).
                accountNumber(faker.finance().iban("NL")).
                address(faker.address().streetAddress()).
                phoneNumber(faker.phoneNumber().cellPhone()).
                birthday(faker.date().birthday(18, 90)).
                build();
    }
}


Enter fullscreen mode Exit fullscreen mode

The output of the test execution, if we print or inspect the customerData object, is:




{
  "id": 520543,
  "name": "Tena Pagac",
  "profession": "photographer",
  "accountNumber": "NL07HUUN1518167413",
  "address": "12672 Romaguera Tunnel",
  "phoneNumber": "(561) 638-5813",
  "birthday": "Mar 5, 1982, 10:29:18 AM"
}


Enter fullscreen mode Exit fullscreen mode

We could successfully generate the necessary data! But let’s not focus on the differences.

Comparing both approaches

There’re two aspects I would like to consider to choose between one approach or another:

  • legibility of future troubleshooting (log analysis)
  • easy data creation in different criteria

We can see the main differences comparing the data result side by side (click on the image to expand it):

Image description

Legibility of future troubleshooting (analysis)

The regular activity for an engineer that writes code is troubleshooting: we constantly see the logs and debug the application to understand current and future problems in the code.

Now imagine yourself looking at the CustomerData object where the data was filled in with the RandomStringUtils approach: it’s hard to correlate the data you have with a list of objects you might get or even taking a look at the data used inside a log file.

Easy data creation in different criteria

For most of the attributes present in the CustomerData class, you can use RandomStringUtils to generate the different criteria. For example, you can easily set 51 characters to the name attribute and expect a failing constraint validation using RandomStringUtils.randomAlphanumeric(51);

For more specialized data, like phone number and date you need a proper library, and JavaFaker can generate both data.

In this way, we can make the process easier by adopting one library.

Considerations

Of course, I’m put more emphasis on the JavaFaker library because we have almost everything we need to generate data, but it does not exclude a possible necessity to use the RandomStringUtils class or any other class placed in the Apache Commons library.

The main consideration here is the ability to generate all the possible data you need using a single source of truth without reinventing the wheel, as well as the indirect benefits it will show during the troubleshooting process.

Examples

The avoid-random-string-utils project shows a basic example comparing RandomStringUtils vs JavaFaker.

The restassured-complete-basic-example project has a factory data class to generate all the necessary data in different conditions. It’s a good real-world example.

Top comments (0)