Which problem do we want to solve?
We will try to solve one of the worst practices used on tests at any level: fixed/hard-coded data.
I want to avoid as much as possible any manual pre-action before I can run my tests and, because of that, I try to avoid as well the usage of static files (CSV, TXT, XLS, JSON).
Here we will see a common usage from Java developers: the RamdomStringUItils
and how it might not be the best choice for automatic data generation.
By the way, I recommend the automatic data generation in the tests using the Test Data Factory approach, and you can find an example here in my blog: Test Data Factory: Why and How to Use.
The examples described here are simple, without the usage of the Test Data Factory, and will show you why the RandomStringUtils
might not be the best approach.
Example
We will automatically generate data to a Customer object with has the following criteria:
Attribute | Type | Constraints |
---|---|---|
id | int | Not null |
name | String | Not null and size between 2 and 50 characters |
profession | String | Not null and size between 2 and 30 characters |
accountNumber | String | Not null and size as 18 characters |
address | String | Not null and size between 2 and 50 characters |
phoneNumber | String | Not null and size between 11 and 14 characters |
birthday | Date | Not null |
To reduce the number of tests, the key point is to generate valid data given the constraints. In a professional environment, we would implement the tests for the edge cases as well.
Think about this Customer data as an object used in any test level (unit, integration, service, UI).
What does the RandomStringUtils class do?
RandomStringUtils is a class from the Apache Commons Lang library that generates random Strings based on different conditions like:
- length
- letters
- numbers
- alphanumeric
- ASCII
- numeric
It’s a static class where you can directly generate any String, so it’s super handy!
See the example below, where you can generate a different set of random data.
public class RandomStringUtilsExample {
public static void main(String[] args) {
// returns a String with 5 numbers
// example 82114
RandomStringUtils.randomNumeric(5);
// returns an alphanumeric String with length as 30 mixing upper and lower cases
// example gQ6RB8MiwKOg9O3qnHFo7I3jilHoIy
RandomStringUtils.randomAlphanumeric(30);
}
}
What is the result of using RandomStringUtils class?
Let’s first take a look at the code example implementing the usage of RamdonStringUtils
:
- in line 7 is using the
RandomStringUtils.randomNumeric()
method to generate anint
value and, to make it, possible we are parsing theString
into Int usingInteger.valueOf()
- the lines 8 to 12 uses
RandomStringUtils.randomAlphanumeric()
to generate alphanumeric data - line 13 has a fixed date as now (today) because
RandomStringUtils
generates onlyStrings
class BasicExampleTest {
@Test
@DisplayName("Data validations using RandomStringUtils")
void randomStringUtils() {
CustomerData customerData = CustomerData.builder().
id(Integer.valueOf(RandomStringUtils.randomNumeric(10))).
name(RandomStringUtils.randomNumeric(50)).
profession(RandomStringUtils.randomAlphanumeric(30)).
accountNumber(RandomStringUtils.randomAlphanumeric(18)).
address(RandomStringUtils.randomAlphanumeric(50)).
phoneNumber(RandomStringUtils.randomAlphanumeric(14)).
birthday(new Date()).
build();
}
}
The output of the test execution, if we print or inspect the customerData
object, is:
{
"id": 1335130963,
"name": "GGXS19kN6kSuzHwW6T0YjJCxUaIyKKmAaUdQH51gdUAtt1TwqY",
"profession": "0kk8HSiFgCUVfLzbD3PyR6cn8j0LH3",
"accountNumber": "PqvekXb9ekRAJi3ypy",
"address": "90lqP2LHnQMWtmMP8vasO3BR5dsICIL85u5sJ0yjGKWXxCkFsj",
"phoneNumber": "OpoJ3tOE53woy9",
"birthday": "Sep 26, 2021, 10:01:10 PM"
}
We could successfully generate the necessary data! Yay!
What does JavaFaker do?
[]JavaFaker](https://github.com/DiUS/java-faker) is an open-source library based on Faker to generate fake data.
There’ are some nonsense data generation there like Avatar, Friends, etc.., but there’s also a good set of objects that can generate data in certain conditions that matches our necessities.
I invite you to take a look at the GitHub repo and see the different objects to generate data.
What is the result of using JavaFaker?
The code implementation to generate data using the CustomerData
class is:
- in line 9, the
number()
method is in use to generate a random number - in line 10, the
name()
method is in use to generate a full name - in line 11, the
company()
is in use to generate a profession - in line 12, the
finance()
method is in use to generate a valid IBAN for the Netherlands country - in line 13, the
address()
method is in use to generate a full street address - in line 14, the p
honeNumber()
method is in use to generate a cell phone number - in line 15, the
date()
method is in use to generate birthday data with the age between 18 and 90
class BasicExampleTest {
@Test
@DisplayName("Data validations using faker library")
void faker() {
Faker faker = new Faker();
CustomerData customerData = CustomerData.builder().
id((int) faker.number().randomNumber()).
name(faker.name().name()).
profession(faker.company().profession()).
accountNumber(faker.finance().iban("NL")).
address(faker.address().streetAddress()).
phoneNumber(faker.phoneNumber().cellPhone()).
birthday(faker.date().birthday(18, 90)).
build();
}
}
The output of the test execution, if we print or inspect the customerData
object, is:
{
"id": 520543,
"name": "Tena Pagac",
"profession": "photographer",
"accountNumber": "NL07HUUN1518167413",
"address": "12672 Romaguera Tunnel",
"phoneNumber": "(561) 638-5813",
"birthday": "Mar 5, 1982, 10:29:18 AM"
}
We could successfully generate the necessary data! But let’s not focus on the differences.
Comparing both approaches
There’re two aspects I would like to consider to choose between one approach or another:
- legibility of future troubleshooting (log analysis)
- easy data creation in different criteria
We can see the main differences comparing the data result side by side (click on the image to expand it):
Legibility of future troubleshooting (analysis)
The regular activity for an engineer that writes code is troubleshooting: we constantly see the logs and debug the application to understand current and future problems in the code.
Now imagine yourself looking at the CustomerData object where the data was filled in with the RandomStringUtils approach: it’s hard to correlate the data you have with a list of objects you might get or even taking a look at the data used inside a log file.
Easy data creation in different criteria
For most of the attributes present in the CustomerData
class, you can use RandomStringUtils
to generate the different criteria. For example, you can easily set 51 characters to the name attribute and expect a failing constraint validation using RandomStringUtils.randomAlphanumeric(51);
For more specialized data, like phone number and date you need a proper library, and JavaFaker can generate both data.
In this way, we can make the process easier by adopting one library.
Considerations
Of course, I’m put more emphasis on the JavaFaker library because we have almost everything we need to generate data, but it does not exclude a possible necessity to use the RandomStringUtils class or any other class placed in the Apache Commons library.
The main consideration here is the ability to generate all the possible data you need using a single source of truth without reinventing the wheel, as well as the indirect benefits it will show during the troubleshooting process.
Examples
The avoid-random-string-utils project shows a basic example comparing RandomStringUtils vs JavaFaker.
The restassured-complete-basic-example project has a factory data class to generate all the necessary data in different conditions. It’s a good real-world example.
Top comments (0)