DEV Community

Come with me on a journey through this website's source code by way of a bug

rhymes on October 09, 2018

This article started because I thought that researching a particular bug could be useful to understand devto's source code a little better. So I li...

Read full post

Dian Fay • Oct 10 '18

O/RM methods like destroy_all really need to be renamed destroy_slowly_and_painfully_use_sql_if_you_can, I swear. This isn't the first time I've seen something like this. Batching ids to delete is one option; if you can just run a script or prepared statement to delete reactions by the article/post id that's even better since the engine can use an index on the foreign key (although the difference may be negligible at the scales we're talking about currently). If the reaction deletion from users' feeds is in-band it probably doesn't need to be, and managing the cache of reactions to a dead article is obviously superfluous. No idea why it's saving other articles since there isn't any metrics aggregation but that's already out of band like you mentioned.

rhymes • Oct 10 '18

O/RM methods like destroy_all really need to be renamed destroy_slowly_and_painfully_use_sql_if_you_can, I swear.

Ahahah, yeah pretty much.

This isn't the first time I've seen something like this

Nope, definitely not. A lot of ORM like AR encourage their users to surrender to all possible callbacks and many of the gems used by Rails applications hook into those callbacks without the developer knowing so after a while it's not uncommon to be hit by this issue.

Batching ids to delete is one option; if you can just run a script or prepared statement to delete reactions by the article/post id that's even better since the engine can use an index on the foreign key (although the difference may be negligible at the scales we're talking about currently)

Yeah, a post is never going to have millions of reactions, at least not for a while. By the way there's no foreign key between the two tables (articles and reactions) so no index to take advantage of.

Dian Fay • Oct 10 '18

You could add an index without the proper foreign key constraint, or vice versa, but without looking at the data model my guess is that having both is probably best in this scenario.

rhymes • Oct 10 '18

You can't easily add a foreign key between reactions and articles because of the "polymorphic" nature of these relation. A reaction can be attached to a comment, to an article and other stuff I guess. So what Rails does it setup a pair of fields called reactable_id and reactable_type like this:

PracticalDeveloper_development=# \d reactions
                                          Table "public.reactions"
     Column     |            Type             | Collation | Nullable |                Default
----------------+-----------------------------+-----------+----------+---------------------------------------
 id             | integer                     |           | not null | nextval('reactions_id_seq'::regclass)
 category       | character varying           |           |          |
 created_at     | timestamp without time zone |           | not null |
 points         | double precision            |           |          | 1.0
 reactable_id   | integer                     |           |          |
 reactable_type | character varying           |           |          |
 updated_at     | timestamp without time zone |           | not null |
 user_id        | integer                     |           |          |
Indexes:
    "reactions_pkey" PRIMARY KEY, btree (id)
    "index_reactions_on_category" btree (category)
    "index_reactions_on_reactable_id" btree (reactable_id)
    "index_reactions_on_reactable_type" btree (reactable_type)
    "index_reactions_on_user_id" btree (user_id)

On top of those it mounts what it calles polymorphic associations. The gist is that every select triggered by something like article.reactions becomes a SELECT * FROM reactions WHERE reactable_type = 'Article' AND reactable_id = 1234, same for the other queries

Dian Fay • Oct 10 '18

Gross. The site runs on Postgres though iirc, so you could establish a foreign key relationship between reactions and reactables as a parent table extended by articles, posts, and so on. But I don't know if ActiveRecord would play nicely with that kind of specialized structure.

rhymes • Oct 10 '18

No you can't use Postgresql inheritance in Rails, not easily.

There's a concept of single table inheritance but it's all virtual. The various entities share the same table and are distinguished by a type column. You can find the explanation here medium.com/@dcordz/single-table-in... - I've used it once and it was used to map entities with differed only in name (sensors with a uniform API)

Dian Fay • Oct 10 '18

Another option is a complex junction table with reaction_id, article_id, post_id, and a CHECK constraint ensuring that only one reactable foreign key can have a value. But that does add enough complexity to at least give me pause.

The indexing in the current model could still be improved since reactable_id is unreliable on its own and reactable_type is low-cardinality. A single index on (reactable_type, reactable_id) would be much more useful (including for bulk delete!).

rhymes • Oct 10 '18

Yeah, I think the index on both keys is a good comprise

Dian Fay • Oct 10 '18

It might be nice longterm to merge reactables if they're similar enough. Articles are almost exactly posts without parents, after all. Ancillary information can go into a specialization table with an optional 1:1 relationship.

Ben Halpern • Oct 10 '18

This is fabulous, I’ll have some thoughts to add color in the morning.

There are a couple main themes here for me:

what makes sense in one context does not in another, which is part of why ActiveRecord callbacks are not well-loved in general.
Certain “secondary” actions, like deleting articles, remain under-cared for amidst the focus on developing actions taken more frequently, and perhaps with more on the line.

This is exactly why we open sourced. Dives like this help make the code awesome.

rhymes • Oct 10 '18

Thanks Ben! Take your time, I know it's a lot :-)

what makes sense in one context does not in another, which is part of why ActiveRecord callbacks are not well-loved in general

Yeah, count me in. I tend to move the callback logic in explicit "service objects" or something like that, like your "labor" folder. It makes testing a little bit easier as well I guess. These kind of refactorings definitely require lots of digging, testing and code breaking because you have to go counter the Rails way of trusting implicit side effects and make them explicit. The gain is that code usually becomes as side effect free as possible and more understandable for new comers because it tends to be narrated in the controller:

def destroy
  CacheBuster.bust(article)
  SearchSeeker.byebye(article)
  RelationshipCounselor.drop_or_ghost_everyone_and_everything(article)
  article.destroy!
end

something like that :D

Ben Halpern • Oct 12 '18

We currently do use this sort of pattern, and even give it its own folder app/services, but we only use it sometimes.

Right now I'd say we're sort of caught in between "hacked to get things to work" and "decent architecture" state (most code could probably be described that way).

The problem is probably that we've sort of gone half-way with some of these things.

The fact that we have the labor folder and the services folder (and the liquid_tags) and the models folders is indicative of some indecision combined with lack of taking the time to actually take the time for dedicated large-scale refactoring.

There are different schools of thoughts in terms of folder structure here (should it all go in models, with subfolders?)

If you have any opinions here, I'm all ears.

PS, we actually define our own timeout at about 9 seconds. RACK_TIMEOUT accepts some a Rails-magicky ENV variable called RACK_TIMEOUT_SERVICE_TIMEOUT. We could extend this longer, but everything we do really should finish sooner, so it would be better to fix the code IMO.

rhymes • Oct 12 '18

The fact that we have the labor folder and the services folder (and the liquid_tags) and the models folders is indicative of some indecision combined with lack of taking the time to actually take the time for dedicated large-scale refactoring.

I think it's normal :) A way to refactor could be choosing one style and slowly convert the rest to it

There are different schools of thoughts in terms of folder structure here (should it all go in models, with subfolders?)

Dunno. I've never liked the "fat model" approach because it leads to giant files with a lot of different business logic. For example, where do you put a method that operates on a user and a comment? In user? In comment? 42?

One thing that I like is having "third party services" clients in the same folder, especially if they are used in multiple part of the business logic.

I've also used presenters in the past to group the display logic and (some) query objects to hide complex queries and be able to test them in isolation.

These are a few articles on the topic of how to structure or refactor Rails code:

7 Patterns to Refactor Fat ActiveRecord Models (an intro from 2012 but with some still relevant pointers)
Presenters: I love presenters, because they make clear which part of the logic is just for the templates. A date formatter for the UI shouldn't be in a model for example. They aren't super easy to test in pure Ruby but dev.to moved on from controller tests so this shouldn't be an issue.
Sandi Metz’ rules for developers are more or less in tuned with what CodeClimate is warning about (ps. it's okay to break them if needed :D)
This is the technique I used to extract query objects

Another thing I would consider is to make more explicit what is an asynchronous job. The code base right now uses delayed_job directly with its various .delay, _without_delay and handle_asynchronously. Refactoring all of this into Active Jobs would have the effect of make jobs explicit and produce a nice app/jobs folder as a central repository for all the async stuff. This way you can test it in isolation if needed or if in the future you outgrow delayed job you know where to go to make the changes.

PS, we actually define our own timeout at about 9 seconds. RACK_TIMEOUT accepts some a Rails-magicky ENV variable called RACK_TIMEOUT_SERVICE_TIMEOUT.

Ah! Good to know :) Well, it's entirely possible that the wall is hit at 9 seconds then.

We could extend this longer, but everything we do really should finish sooner, so it would be better to fix the code IMO.

Agreed

rhymes • Oct 12 '18

I know it's a lot, but the great thing about refactoring is that it can be (mostly) done one step a time :-)