I've been coding for a couple of years now, but I've rarely touched any open-source code and played with it. Having seen many experienced developers account their successes (partially, of course) to open-source contributions, and believing that diving into that world would ultimately bump up my skills with cool tricks I wouldn't learn otherwise, I decided to give it a shot.
So without further ado, let's get right into this.
Faker Gem
Faker is a handy gem when it comes to testing your code. It generates fake data quickly and easily, and it supports a wide variety of data, such as names, emails, phone numbers and even Game of Thrones characters. Make sure to visit the repo to learn more about it.
Faker Modules
As mentioned above, faker has different Modules for the types of data that you wish to use. You can use something like Faker::Address.street_name
, or Faker::Internet.public_ip_v4_address
to get a random street name or IP address, respectively. You can find other pretty cool modules on the repo.
The Problem
I've been using the Faker::Time
module, which (as you might have guessed), generates random timestamps for you to use in your tests. This module has a between
method that takes from and to dates, and generates a date inside the range.
The use case was to generate a time between yesterday and today, and this could be achieved by:
Faker::Time.between(DateTime.now - 1, DateTime.now)
A sample output would be:
=> 2017-06-24 01:00:32 +0200
The problem is that we include DateTime.now
as the upper bound, so it could possibly generate a date that corresponds to today. However, it doesn't take into account the hours, minutes or seconds of the day that still haven't come yet. Meaning, if it's 24/6 today and it's 10:00:00 AM, it could still generate a date that is on 24/6 but at 10:30:00 AM (or some time past 10:00:00 AM in general). That was an issue because it led to randomly failing test cases. We particularly didn't want dates in the future to be generated.
The Solution (the fun part, hopefully)
This issue of randomly failing test cases troubled me a lot, so I decided to read the implementation code of the Faker::Time
module to see what was causing the problem. And so I did, and I kind of got an intuition about what could be done to work around the problem. So, I forked the repo, cloned it onto my machine, and started playing around.
As the title of the article suggests, I'm not writing this to demonstrate the solution. I'm demonstrating why tinkering with open-source projects is useful. I worked on this to solve a problem, and I ended up learning more about code refactoring in Ruby. And I decided I might make use of this to try to demonstrate the flexibility of this beautiful language.
First, let's look at the source of the problem by inspecting the code:
module Faker
class Time < Date
TIME_RANGES = {
:all => (0..23),
:day => (9..17),
:night => (18..23),
:morning => (6..11),
:afternoon => (12..17),
:evening => (17..21),
:midnight => (0..4)
}
class << self
def between(from, to, period = :all, format = nil)
time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), period)
time_with_format(time, format)
end
# more code ...
private
def date_with_random_time(date, period)
::Time.local(date.year, date.month, date.day, hours(period), minutes, seconds)
end
# more code ...
def hours(period)
raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
sample(TIME_RANGES[period].to_a)
end
def minutes
seconds
end
def seconds
sample((0..59).to_a)
end
end
end
end
Breakdown
- The
Faker::Time.between(DateTime.now - 1, DateTime.now)
call would cause thebetween
method to call thedate_with_random_time
method. - This method generates a date whose hours, minutes and seconds are generated from the methods
hours
,minutes
andseconds
respectively. And that was precisely the problem. - It is clear that the
seconds
method (whichminutes
depends on as well) generates an array of all of the seconds in a day, and takes a sample off of it (picks one at random), regardless of whether it has actually passed or not (its effect is more prominent in minutes than in seconds). - The
hours
method does essentially the same thing, it only utilizes a hash which takes a period as the key, which is by default:all
, and this returns all of the hours of the day.
Having identified where the problem is, I'm going to demonstrate how my code evolved, showing different versions of it one at a time. This will hopefully give you an insight of how I incrementally refactored my code.
Version 1
The idea is simple. During the array generation step and before sampling, I would only let inside the arrays the hours/minutes/seconds that have passed already. Abusing the fact that everything in Ruby is an object, I could call select
on the array, and pass it a block which filters the dates that are not in the future.
So the code looks something like this:
def hours(period)
raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour })
end
def minutes
sample((0..59).to_a.select { |t| t <= DateTime.now.min })
end
def seconds
sample((0..59).to_a.select { |t| t <= DateTime.now.sec })
end
It's clear why minutes
has its own implementation now. It has to be bound by DateTime.now.min
.
Version 2
There are two concerns regarding this first version:
- My modification has overridden the default behaviour of the gem. What if a part in our code can tolerate dates that are in the future? Plus, This gem is used across our codebase and I don't want to go around modifying people's work to accommodate the changes I made.
- Both
minutes
andseconds
methods look very similar, and it's not preferable to have duplicate code.
So now I have to keep the default behaviour for those who use it, and also make sure that the modification (that only I will use for now) wouldn't require changing people's work.
Tackling the 1st concern
I introduced a new boolean parameter to the between
method, namely bound_by_now
, which defaults to false
. Adding this extra parameter with a default value solves the issue of having other people change their implementation.
Moreover, I would separate the default behaviour from my new implementation using a fancy Ruby ternary operator.
def between(from, to, bound_by_now = false, period = :all, format = nil)
time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), bound_by_now, period)
time_with_format(time, format)
end
# more code ...
private
def date_with_random_time(date, bound_by_now, period)
::Time.local(date.year, date.month, date.day, hours(bound_by_now, period), minutes(bound_by_now), seconds(bound_by_now))
end
# more code ...
def hours(bound_by_now, period)
raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
bound_by_now == true ? sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour }) : sample(TIME_RANGES[period].to_a)
end
def minutes(bound_by_now)
bound_by_now == true ? sample((0..59).to_a.select { |t| t <= DateTime.now.min }) : sample((0..59).to_a)
end
def seconds(bound_by_now)
bound_by_now == true ? sample((0..59).to_a.select { |t| t <= DateTime.now.sec }) : sample((0..59).to_a)
end
Tackling the 2nd concern
One way to do this is to move the logic to a separate method, namely get_min_sec
, and call it with an identifier (for now, a string) to differentiate between minutes and seconds.
def minutes(bound_by_now)
get_min_sec('min', bound_by_now)
end
def seconds(bound_by_now)
get_min_sec('sec', bound_by_now)
end
def get_min_sec(type, bound_by_now)
case type
when 'min' do limit = DateTime.now.min
when 'sec' do limit = DateTime.now.sec
end
bound_by_now == true ? sample((0..59).to_a.select { |t| t <= limit }) : sample((0..59).to_a)
end
So the final modified version looks something like:
def between(from, to, bound_by_now = false, period = :all, format = nil)
time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), bound_by_now, period)
time_with_format(time, format)
end
# more code ...
private
def date_with_random_time(date, bound_by_now, period)
::Time.local(date.year, date.month, date.day, hours(bound_by_now, period), minutes(bound_by_now), seconds(bound_by_now))
end
# more code ...
def hours(bound_by_now, period)
raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
bound_by_now == true ? sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour }) : sample(TIME_RANGES[period].to_a)
end
def minutes(bound_by_now)
get_min_sec('min', bound_by_now)
end
def seconds(bound_by_now)
get_min_sec('sec', bound_by_now)
end
def get_min_sec(type, bound_by_now)
case type
when 'min' do limit = DateTime.now.min
when 'sec' do limit = DateTime.now.sec
end
bound_by_now == true ? sample((0..59).to_a.select { |t| t <= limit }) : sample((0..59).to_a)
end
Version 3
Obviously this separation has come at the cost of a lengthier code. I had to introduce control statements, such as the ternary operator and the case statement. But again, Ruby is very flexible, and this too can be fixed.
Removing the ternary operator
The block passed to the select
method eventually evaluates to a boolean, so why not inject bound_by_now
in their as well? Check this code below:
select { some_boolean_expression || !bound_by_now })
- If
bound_by_now
istrue
, then the block depends on the first expression (as anything ORed withfalse
is itself), which is the one that will select only the dates that have passed. - If 'bound_by_now' is
false
, then all elements of the array will be selected, since now the the whole block evaluates totrue
.
So, applying this to hours
and get_min_sec
, we get:
def hours(bound_by_now, period)
raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour || !bound_by_now })
end
def minutes(bound_by_now)
get_min_sec('min', bound_by_now)
end
def seconds(bound_by_now)
get_min_sec('sec', bound_by_now)
end
def get_min_sec(type, bound_by_now)
case type
when 'min' do limit = DateTime.now.min
when 'sec' do limit = DateTime.now.sec
end
sample((0..59).to_a.select { |t| t <= limit || !bound_by_now })
Removing the case statement
As explained in Version 2, minutes
and seconds
pass identifiers to get_min_sec
so that it can decide on the bounding limit. But how can we remove this case statement? We can make use of 2 Ruby tricks:
-
Ruby supports symbols, which are similar to strings in some sense, except that there can only be one copy of a symbol at a time. This is a good use case of identifiers. A symbol is declared by a colon followed by a name, such as
:id
. - Objects in Ruby respond to a method called
send
, which takes as a parameter an identifier of a method, and it simply calls that method on the object (if the object responds to that method, of course). We can easily create:min
and:sec
symbols here and pass them tosend
invoked onDateTime.now
.
Now we can modify the code to look something like:
def minutes(bound_by_now)
get_min_sec(:min, bound_by_now)
end
def seconds(bound_by_now)
get_min_sec(:sec, bound_by_now)
end
def get_min_sec(type, bound_by_now)
sample((0..59).to_a.select { |t| t <= DateTime.now.send(type) || !bound_by_now })
end
So finally after refactoring both control statements, Version 3 looks as follows:
def hours(bound_by_now, period)
raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour || !bound_by_now })
end
def minutes(bound_by_now)
get_min_sec(:min, bound_by_now)
end
def seconds(bound_by_now)
get_min_sec(:sec, bound_by_now)
end
def get_min_sec(type, bound_by_now)
sample((0..59).to_a.select { |t| t <= DateTime.now.send(type) || !bound_by_now })
end
Full Code & Testing
Now that the code is complete, let's review it one last time, and also check how we can test it.
Full Code
Here's the full modified code after Version 3:
def between(from, to, bound_by_now = false, period = :all, format = nil)
time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), bound_by_now, period)
time_with_format(time, format)
end
# more code ...
private
def date_with_random_time(date, bound_by_now, period)
::Time.local(date.year, date.month, date.day, hours(bound_by_now, period), minutes(bound_by_now), seconds(bound_by_now))
end
# more code ...
def hours(bound_by_now, period)
raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour || !bound_by_now })
end
def minutes(bound_by_now)
get_min_sec(:min, bound_by_now)
end
def seconds(bound_by_now)
get_min_sec(:sec, bound_by_now)
end
def get_min_sec(type, bound_by_now)
sample((0..59).to_a.select { |t| t <= DateTime.now.send(type) || !bound_by_now })
end
Testing
Now, in order to test the new behaviour, I could do the following:
- Create an empty array.
- Push a number of randomly generated dates using
Fake::Time.between
- Select those who are generated in the future (
> DateTime.now
) and make sure it prints out an empty array.
This can be translated into code easily:
a = []
for i in 1..500 do
a << Faker::Time.between(DateTime.now - 1, DateTime.now, true)
end
res = a.select { |t| t > Time.now }
puts res
and it would output:
=> nil
However, this too can be refactored! It can be achieved using a single line:
Array.new(500){ Faker::Time.between(DateTime.now - 1, DateTime.now, true) }.select { |t| t > Time.now }
and this would output:
=> []
Using the same code, I can omit that third true
parameter to activate the default behaviour of the gem. This would possibly print some dates in the future from the array, as it is likely to find some since I'm generating 500 dates.
Final Thoughts
Contributing to open-source projects definitely contributes to your progress as a developer. It pushes you to learn because you actually get to solve real problems. And I really hope I was able to demonstrate that by giving an example.
Finally, I am yet to become a refactoring master. So if anyone has any suggestions, please let me know!
Top comments (3)
Wouldnโt the code above also filter 13:42 when the time is 14:34 because the minute part is bigger, even though the time is actually earlier because of the hour part?
(It is weird to comment on a 2-year-old post but I am working on my Pocket list :) )
Great contribution, great writeup!
Thanks!