Nir Tayeb

Posted on Feb 11 • Updated on Feb 13

6 Ways You're Using ORMs Wrong and How To Fix

#webdev #programming #database #bestpractices

We love ORMs because they simplify and streamline the working process with relational databases.

ORMs provide a higher level of abstraction, allowing us to work with objects and classes instead of raw SQL queries. They automate everyday database tasks such as CRUD operations, reducing the boilerplate code we need to write. ORMs handle database connections and transactions, making managing and scaling our applications easier. With ORMs, it's possible to write database-agnostic code, quickly switch between different database systems, and focus more on application logic.

But with all the benefits and abstractions, it is easy to write poorly performant code.

For the examples, I'll use the scheme from the Prisma ORM documentation, which describes a blogging platform with users, profile details, posts, and categories.

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

generator client {
  provider = "prisma-client-js"
}

model User {
  id      Int      @id @default(autoincrement())
  email   String   @unique
  name    String?
  role    Role     @default(USER)
  posts   Post[]
  profile Profile?
}

model Profile {
  id     Int    @id @default(autoincrement())
  bio    String
  user   User   @relation(fields: [userId], references: [id])
  userId Int    @unique
}

model Post {
  id         Int        @id @default(autoincrement())
  createdAt  DateTime   @default(now())
  updatedAt  DateTime   @updatedAt
  title      String
  published  Boolean    @default(false)
  author     User       @relation(fields: [authorId], references: [id])
  authorId   Int
  categories Category[]
}

model Category {
  id    Int    @id @default(autoincrement())
  name  String
  posts Post[]
}

enum Role {
  USER
  ADMIN
}

The n+1 problem

Fetching of data without using eager loading

Assuming we want to list all user posts with a User role. A naive approach to writing the code is:

async getUserPosts() {
    const users = await prisma.user.findMany({where: {role: prisma.Role.User}});
  return users.flatMap(user => user.posts);
}

Behind the scenes, what is going to happen:

Query the database for the user's table for all the users with role admin:

SELECT * FROM users WHERE role='User'

Then, For each record, query the database (again) for the user posts:

SELECT * FROM posts WHERE userid=<X>

You won't feel the performance issue when developing the application with a few test users. But once you get to hundreds or thousands of users in the production environment, this function will take "forever" from a user experience perspective. Every round trip to your DB takes at least 10ms, not including the query that needs to run and the data transfer between the databases and your server.

Using eager-loading, you will reach the database only once!

async getUserPosts() {
    const users = await prisma.user.findMany({
        where: {role: prisma.Role.User}, 
        include: {posts: true }
    });
    return users.flatMap(user => user.posts);
}

By using include here, the ORM (prisma in my example) uses the SQL Join statement when constructing the SQL to query the data.

SELECT * 
FROM users
INNER JOIN posts ON users.id = posts.userid 
WHERE role='User'

Another variation of the n+1 problem

Assuming we want to list all posts with their writer's short bio, a naive approach will look like this:

const posts = await prisma.posts.findMany();
const postsWithBio = posts.map((post: prisma.Post) => {
    return {
        title: post.title,
    publishedAt: post.publishedAt,
    authorBio: post.author.profile?.bio
});

Again, the issue is that the code will make a round trip to the database for each post to query the author and the profile. The solution in this specific case is to use include twice, but if we have a more complex schema that uses more relations between the entities. We can implement another solution.

First, map all the unique author IDs, then fetch only the profiles (or bios) of these author IDs and map them

async function getPostWithBio(){
    const posts = await prisma.posts.findMany();
    const authorIds = posts.map((post: prisma.Post) => post.authorid);
    const profiles = await prisma.profiles.findMany({authorId: {in: authorIds}});
    const authorToBio = new Map(profiles.map(p => [p.userId, p.bio]));

    const postsWithBio = posts.map((post: prisma.Post) => {
        return {
            title: post.title,
        publishedAt: post.publishedAt,
        authorBio: authorToBio.get(post.authorId)
    });
}

In this way, only two queries are going to the database.

SELECT * FROM posts;
SELECT * FROM profiles where userid in (1,2,3,4,...);

Remember that this is a straightforward use case; this issue is widespread in more complex systems and looks slightly different. Your code will iterate over one entity, then go into other classes and functions, which will fetch more data from the DB, creating the N+1 problem without you notice.

How to identify the N+1

The best way to identify the N+1 issue in your system is by monitoring your app in three different ways:

Turn on your debug/info log and configure your ORM to log the SQL queries it generates.
Watch the database query log and look for frequent and similar queries fetching by one ID.
Using an Application Performance Monitoring (APM) system such as NewRelic, Sentry, and Azure Application Insight - they visualize all the calls to the DB on each request/transaction and show you how long it took and how long each request spent communicating with the DB.

Querying without using indexes

No defining Index

Assuming we need to search for posts by title, a naive implementation will look like this:

const posts = await prisma.posts.findMany({title: query});

If we look in the schema, we won't see an index definition on the title field. We won't see any index explicitly defined in our schema (except primary keys/foreign keys relations/unique). Why? Because it's not intuitive for object-oriented/functional programmers to think about data indexing.

Running the code above will result in a full table scan to look up the query. As the name suggests, a full table scan means an O(N) lookup for the data. Running the query will take a lot of time and resources if we have thousands of posts and more properties on each post.

If we set an index (of the default "b-tree" type) on the field, the lookup time will usually decrease to O(Log-N).

model Post {
  id         Int        @id @default(autoincrement())
  createdAt  DateTime   @default(now())
  updatedAt  DateTime   @updatedAt
  title      String
  published  Boolean    @default(false)
  author     User       @relation(fields: [authorId], references: [id])
  authorId   Int
  categories Category[]

  @@index([title])
}

💡 Suppose you don't use the query frequently. Instead of defining an index on the title field, you can add conditions to the query using other indexes. Then, the search will be on a subset of the data instead of having a full table scan.

💡 Remember that the cost of indexing the data is a lower write performance. So, depending on your use case, decide which index is necessary and which is nice if writing performance matters to you.

The query cannot use the index (wrong index type)

Assuming we want to query posts by a string found in the title.

const posts = await prisma.posts.findMany({title: {contains:"query"}});

This code results in a query including the "LIKE" operator:

SELECT * FROM posts WHERE title LIKE '%query%';

The default index algorithm is B-Tree, and leading '%' cannot be queried using the index.

The solution is to define another type of index to query the data efficiently; in this case, it's GIN or GiST (on PostgreSQL)

@@index([title], type: GIN)

For further reading, I suggest reading this fantastic blog post explaining how trigram-based operations and indexes improve query performance by several orders of magnitude.

Using model objects as output

Assuming we need to implement an endpoint to fetch all users, a naive approach will look like this.

@Controller('users')
export class UsersController{
  @Get()
  findAll(): prisma.User {
    return await prisma.users.findMany({});
  }
}

When nest.js and other frameworks evaluate objects, they transform the object entirely to JSON, iterating all over the properties and sub-properties. Using the model as the output will result in lazy loading of each user's relations, meaning more queries to the DB and more unnecessary data returned with the requests. It can also lead to a security risk of exposing private or confidential information.

Instead, you should clearly define the interface to use. With typescript, it's possible to use Pick to define a partial of another type to pick only a subset of the properties or Omit to exclude some properties.

Query unnecessary data

Assuming we want to query all posts and print only their dates and titles:

const posts = await prisma.posts.findMany({});
for (const post of posts) {
   console.log(post.title, post.publishedAt);
}

This code ends up with this SQL query:

SELECT * FROM posts;

It looks simple, but if we have even 1000 posts and their content each is 10KB of words, we transfer at least 10MB of data without even using it. Instead, we should select only the fields we need.

const posts = await prisma.posts.findMany({select: ['title', 'publishedAt']});
for (const post of posts) {
   console.log(post.title, post.publishedAt);
}

Which will end up with a query like:

SELECT title, publishedAt FROM posts;

In summary, using ORM is excellent for productivity but not always suitable for performance. Understanding what is happening behind the scenes is crucial to ensure optimal code performance.

If you enjoyed the article, please forward and share it with your teammates/acquaintances and help them avoid these mistakes.
This post was originally published in my newsletter "Percentile 99th".
My subscribers get all the information first; in the future, exclusive content will be shared only over the newsletter.

Top comments (7)

Jean-Michel 🕵🏻‍♂️ Fayard • Feb 13 • Edited

That's a great article, thanks a lot

At the same time, the article reminds me why I don't like ORMs very much. I guess your milenage may vary, especially if you use all the time the same stack with the same ORM. But I had to learn multiple ORMs in my career and I very much don't want to do that again.

What bothers me in your examples is that all the wrong code looks totally legit, and I find it bad when good cod and bad code both look legit.
Think about the GOTO statement. GOTO is not evil per se, it's used all the time in the Linux kernel. The issue there is that good code and bad code that use GOTO looks really similar.

We love ORMs because they simplify and streamline the working process with relational databases
ORMs provide a higher level of abstraction, allowing us to work with objects and classes instead of raw SQL queries. They automate everyday database tasks such as CRUD operations, reducing the boilerplate code we need to write. ORMs handle database connections and transactions, making managing and scaling our applications easier. With ORMs, it's possible to write database-agnostic code, quickly switch between different database systems, and focus more on application logic

I have a counter arguments to this

Actually SQL is a great level of abstraction. It's a standard that has been very very useful and stable since decades. That's quite rare in our industry. It's a very important skill to have and I would rather learn SQL than learn an ORM.
The database-agnostic part is not that a big deal in practice. The reason it's possible at all is that SQL variants are quite similar in the first place. And most importantly, you switching databases is not something you want to do often in the lifetime of a project. You can pick Postgres for example, base all your company on it, and the odds that you can succeed with it are quite high.

So what's the issue with SQL really ?

In the end the part that sucks with direct SQL are a limited amount of things

no auto-complete when you write the queries
no type safety
boring mapping and unmappings
boring duplicate information between classes and schema definition
....

An ORM is an attempt to solve this problem by being code first. Your code generates SQL.

Nowdays I very much prefer SQL-first solution that starts with SQL and generates the code boilerplate that you indeed don't want to write manually.

You write pure SQL queries, and become good at SQL, and then you have the IDEA and a magical plugin generate the part of the code that would be otherwise super boring. And auto-complete in the IDE!

cashapp.github.io/sqldelight

Mike Talbot ⭐ • Feb 13

Absolutely, I 100% agree. ORMs strike me as a way object-oriented code developers try to forget that a database exists and has its own vagaries. Forcing you to think about your tables and their relationships rather than hiding the mechanism behind some sugar makes you much more likely to write efficient code and come to terms with what operations are necessary.

Nir Tayeb • Feb 13

SQL First is an interesting approach, but I doubt it will make someone code and develop products faster or help decrease the issues mentioned.

Most of the issues I mentioned would also happen with the SQL First approach because you need to think beforehand about what you develop and have a deeper understanding of how databases work and how you connect your code with external services and the its implications.

For example, with the N+1 problem, one developer will use the abstracted "get all posts query," and later, another developer will add an abstracted call to query for the "author details" in the posts loop.

Same for having "SELECT *" or not using/understanding indexes, the SQL First approach won't solve it.

Jean-Michel 🕵🏻‍♂️ Fayard • Feb 13

Initially it doesn't change anything
But the SQL first approach tends to nudge you to become actually skilled at SQL - vs seeing as an implementation detail
You have all the resources available on SQL at your disposal - vs only the resources for your particular ORM
And you can become at that on the long term - vs only knowing the pitfalls specific to Hibernate and then you move on to another ORM

That reminds me of Joel Spolsky's law of leaky abstractions