For much of 2022 I was working on an open data classification tool built in Ruby on Rails. Although initially developed to help classify emergency call data, the tool could be used for any type of data, with some modification.
When describing a data set, the user uploads a CSV file and that file is parsed to generate some basic statistics and get the list of headers. After mapping fields from the data set to a common schema, the CSV is again parsed looking for unique values that need to be classified.
Many open data platforms expose a public API that could be used instead of uploading a CSV. These APIs would require more information and code for each platform to get the same information as the CSV. However, this information could be acquired more quickly and with far fewer resources. Using an API would also allow for future updates to the data set.
My experience with Rails before this project was minimal. I could visualize how I'd implement multiple data source types in general, but I need to learn how to do it "the Rails way." I'm not certain I accomplished that completely, but I did learn some things about Rails models and associations along the way.
What is STI?
Single Table Inheritance (STI)1 is a design pattern in Ruby on Rails that allows you to use a single database table to store multiple types of objects that share some common attributes. This is accomplished by adding a type column to the table that's used to store the class name of each object.
For example, you might have a Person
class that has a name
attribute and a type
attribute. You could then create two subclasses of Person
: Employee
and Student
. In the database, all the Employee
and Student
objects would be stored in the same table as Person
objects. The type
column would be used to differentiate between the different types of objects.
class Person < ApplicationRecord; end
class Employee < Person; end
class Student < Person; end
Subclasses can share any number of attributes (as long as the type remains the same) as well as have their own attributes. Each attribute will be added as a column on the table, which can make it difficult to scale if you have many subclasses with differing attributes. This is important to consider when deciding to implement STI over MTI (Multiple Table Inheritance).
What are associations?
Associations are a way to define relationships between Active Record models. These relationships allow you to specify how one model is related to another, and how the models should interact with each other.
There are several types of associations that you can use in Rails:
-
belongs_to
: used for relationships where the current model will store the reference to a related model. For example, aProfile
model thatbelongs_to
aUser
. -
has_one
2: used for one-to-one relationships where the related model includes a reference to the current model. For example, aUser
model thathas_one
Profile
. -
has_many
2: used for one-to-many relationships where the related models include a reference to the current model. For example, aUser
model thathas_many
Notifications
. -
has_and_belongs_to_many
: used for many-to-many relationships and uses a junction table to store the references. For example: anAuthor
model thathas_and_belongs_to_many
Books
.
What are polymorphic associations?
Polymorphic associations allow a model to belong to more than one other model using the same association. This is done by adding a type column to reference the model, along with the standard id column. For example, you could have a Comment model that can belong to either a Post or a Product:
class Comment < ApplicationRecord
belongs_to :commentable, polymorphic: true
end
class Post < ApplicationRecord
has_many :comments, as :commentable
end
class Product < ApplicationRecord
has_many :comments, as :commentable
end
With this association, you can use call .commentable
on a comment to get the comment's parent, regardless of whether it is a post or product.
Why?
I opted to use STI to represent the data source models, which would all inherit from DataSource. To begin with, there'd be two children: CsvFile and Socrata (an open data platform). There are a few reasons for the decision3:
- The number of shared fields between data sources is likely to be high, but split between two types: file sources and API sources.
- Does not increase database complexity with each new data source.
- Extensibility and modularity: data sources could be packed as gems and contributed by third-parties.
Polymorphic associations made this a breeze:
migration.rb
class CreateDataSources < ActiveRecord::Migration[7.0]
def change
create_table :data_sources do |t|
t.string :type
t.string :name
t.string :api_domain
t.string :api_resource
t.string :api_key
t.timestamps
end
add_reference :data_sets, :data_source, null: false, polymorphic: true
end
end
models.rb
class DataSet < ApplicationRecord
belongs_to :data_source, polymorphic: true, optional: false, dependent: :destroy
end
class DataSource < ApplicationRecord
has_one :data_set, as: :data_source
end
class CsvFile < DataSource
has_many_attached :files, dependent: :destroy
validates :files, attached: true
end
class Socrata < DataSource
validates :api_domain, presence: true
validates :api_resource, presence: true
end
And with that, we've created this relationship:
Final thoughts
Single table inheritance lets you separate logic without repeating code or complicating the database schema. Polymorphic associations make this pattern even more powerful. However, it can also result in large tables with lots of empty columns. If you expect your child models to differ significantly in their field, you should consider a different implementation.
-
Yes, I know what else STI stands for, but I'm not going to repeat "single table inheritance" seven times. ↩
-
These associations also have a
through
option that uses an additional model in the middle. ↩ -
If I come to regret this decision, you can expect a post titled "Refactoring your way out of STI." ↩
Top comments (0)