So, hashes, uh?
What's good about them?
- They are easy to build
- They are cheap (short-term cheap at least)
- They help interfacing with shapeless data (like JSON) in an easy way.
- They are good to build simple/cheap key-value stores in memory.
What's not so good about them?
- They are shapeless (in a bad way, you never know what you expect from them)
- They are mutable (in a bad way as well, how do you know nobody else will mutate this or nobody has already mutated it?)
- They are dummy (unlike objects, they know nothing, they have no way to interface with except for
Enumerable
) - They are impossible to type (🤷♂️)
- They need a lot of defensive code with
nil
checks. (even when they are getting better at this with things likehash.dig(:key1, :key2)
it is not enough and you always need extra checks here and there, which makes the code difficult to read and to change).
So, what should we do with them?
I propose that we all agree on a small set of guidelines to help us answer the question "Should I use a Hash
for this?". TL;DR
- Avoid using Hashes as input to methods. Try to have shaped inputs.
- Avoid using Hashes as return values. Try to have shaped return values.
- If you receive a Hash, isolate it. Build a layer to interface with instead of the
Hash
.- Get the data directly if it is a small hash.
- Build objects if it has an heterogeneous structure.
1. Avoid using hashes as input to methods.
There's no easy way to know what's inside params
here..
MyInteractor.new(
author: author,
params: params
).call
Instead use this:
MyInteractor.new(
author: author,
value: params[:value],
question_id: params[:question_id]
).call
You now have an explicit knowledge of the input for that interactor which is:
- simpler to use (you don't need a Hash now, only values which can come from any object)
- Intention revealing
- Typable
- Easier to document
Try to avoid substituting the Hash
with a T::Struct
or a plain Struct
if you don't use sorbet. You will be adding an extra layer and having very little in return. You avoid the hash, which is good, but you add an extra layer and win very little in return. That does not justify a new layer.
Instead, type the parameters you're receiving individually with Sorbet. If you don't use sorbet, just avoid the struct and pass them individually.
You might think:
But I have A TON of parameters and a
Hash
is a nice and convenient way to
make my object easier to use!
It isn't. You're sacrificing a lot in exchange for almost nothing and your code will suffer for it.
Another reason to avoid this is that it can imply security holes. There's a very simple example which is mass assignment model attributes from the controller params.
# NOTE: This is not how our code works, this is an example of how NOT to do it
current_user = current_company
.users
.find(session[:current_user])
current_access.update!(params[:user])
Can you spot the issue? It is subtle but it is there. An attacker could just send a company_id
and inject themselves on any company 😱. Not good.
This kind of issue is what made Rails end up adding the strong parameters API: https://guides.rubyonrails.org/action_controller_overview.html#strong-parameters but that isn't enough. Strong parameters are focused on avoiding mass assignment. But you massively pass params
from a controller into any object as parameters you're exposed to this kind of problem as well.
In short: Be as explicit as possible with contracts. Pass unfurled parameters as a preference and type as much as you can.
2. Avoid using hashes as return values.
If whoever uses the code you write doesn't know what to expect from your return value, they'll have a hard time using you. Make it easy for them.
If you need a dummy object, use a T::Struct
instead (or a Struct
). They are very good for transferring data, they have a defined shape and they are highly typable.
3. If you receive a Hash, isolate it.
Let's give an example, you get a Hash
from reading a YAML file, a JSON response from an API or even on a controller action.
This is, by far, the hardest to solve among all the cases because each one will need a different solution.
There are however some guidelines you can follow which help a bit:
1. Isolating hashes: Get their values
If you only need to get some keys from the Hash, e.g. reading the params
on a controller. You're good to go. Slice the keys you need and then read them.
class MyController
def create
MyCreateInteractor.new(
author: current_access,
value: create_params[:value],
description: create_params[:description]
).call.unwrap!
end
def create_params
params.slice(:value, :description)
end
end
Soon we will be able to use sorbet for this, which will give use shaped and typed params. Until that moment arrives, we need to deal with it.
2. Isolating hashes: Build a defensive layer
If the Hash
has layers with more hashes inside of it and you need to extract data from it through Enumerable
methods (find
, map
, select
...) then wrap that hash on an object and interface with that object instead.
There's an example of this on our PdfParser
library. Its configuration is a YAML file with a shape like this:
---
some_data_table:
min_x: 55
min_y: 111
max_x: 532
max_y: 167
transform:
type: table
arguments:
- name: column1
transform: text
max_x: 214
- name: column2
transform: text
max_x: 372
- name: column3
transform: money_es
max_x: 532
some_numeric_field:
min_x: 217
min_y: 81
max_x: 265
max_y: 99
transform: money_es
Interfacing with a hash like this is problematic because you don't know what to expect. Have a look at the transform
for example. On the numeric field, it only has a name. But on the table; it has a type, a list of arguments and more nested transforms.
Let's first understand the problem:
- This file represents the format for a PDF file. A format has many different fields inside (
some_numeric_field
andsome_data_table
for example) in this case. - Each field has boundaries to where it is found on the PDF
- Each field defines a transformation to be applied to it after reading the text on it. E.g.:
money_es
means that it is money formatted like1.000,00
and it has to be transformed into aBigDecimal
How did we get over this? Interfacing with a Hash
like this could have been really complex so we started by separating the parts that change from the parts that stay the same.
- All fields have boundaries
- All fields have a name
- All fields have a transform
What is the changing part? The transform can be different per field so we defined something like this:
Field = Struct.new(:name, :boundaries, :transform)
Box = Struct.new(:min_x, :min_y, :max_x, :max_y)
Transform = Struct.new(:type, :arguments) do
# @param args [Hash]
# @option args [String, Hash] :transform the name of the transform or a
# Hash describing it
# @option :transform [String] :type the name of the transform
# @option :transform [Hash] :arguments a hash with the arguments for the
# transform
def self.from_hash(args)
type, arguments =
case args[:transform]
when String
[args[:transform], nil]
else
[
args[:transform][:type],
args[:transform][:arguments]
]
end
new(type, arguments)
end
With that we build objects from reading the file into a Hash like this:
# name is the key of the YAML definition for a field and arguments is a hash
# with its content.
read_file.map do |name, arguments|
Field.new(
name,
Transform.from_hash(arguments),
Boundaries.new(
arguments[:min_x].to_i,
arguments[:min_y].to_i,
arguments[:max_x].to_i,
arguments[:max_y].to_i
)
)
end
- We only read from the hash in 1 place (when parsing the configuration).
- We have a specific initializer for the case in which we need to pass a hash around (
Transform.from_hash
) and we use acase...when
to map it into objects.
It is indeed more work but saves you from a lot of pain later on. Nobody will know this all came from a Hash, nobody will try to read a shapeless object and nobody will try to mutate it.
What are Hashes good for?
They are good as in-memory key-value stores. But even in that case, build a layer to isolate yourself from it. As an example:
We use this object to store an in-memory cache of the hierarchical relations between employees and their managers. Fetching them is costly and because of our permissions architecture, we do it a lot.
class HierarchyCache
def initialize
@cache = {}
end
# If a key has value, return it
# if it doesn't store the return value of `block` and return the value
def fetch(key, employee, &block)
value = get(key, employee)
return value unless value.nil?
set(key, employee, block.call)
end
# Get the value stored on a key
def get(key, employee)
@cache.dig(employee.id, key)
end
# Sore a value for a key
def set(key, employee, value)
new_cache_entry = { key => value }
if @cache[employee.id]
@cache[employee.id].merge!(new_cache_entry)
else
@cache[employee.id] = new_cache_entry
end
get(key, employee)
end
end
We use a hash here but we take care of not letting anybody know we do and most importantly, we don't leak how the hash is structured and we provide a programmatic API to interface it.
Summary (AKA TL;DR)
- Avoid using Hashes as input to methods. Try to have shaped inputs.
- Avoid using Hashes as return values. Try to have shaped return values.
- If you receive a Hash, isolate it. Build a layer to interface with instead of the
Hash
.- Get the data directly if it is a small hash.
- Build objects if it has a heterogeneous structure.
- Use them as long as nobody can call
[]
or any otherEnumerable
method on them.
Last, but not least, don't serialize things into hashes if you intend to use them later on. If you find yourself using []
on different layers of your code check if you can enforce any of the previous points.
Summarized even more
Don't use them to build contracts (as inputs/outputs). No problem as intermediate values (as long as you isolate them well enough).
Top comments (3)
Thanks for the writeup. Some additions:
.freeze
to hashes, especially ones in constants.gem json-schema
to specify the structure of JSON structures where Structs just won't cut it.Red flag: bad design with high probability. Typically, functions/methods should not take many parameters.
Agree 100% but... (you know, there's always a "but") Sometimes that's not something you can control or not as much as you would like to.
Something I forgot to add is that using a
Hash
for this doesn't fix the problem which is, as you pointed out, a design problem. It would only hide it and make it harder to fix.