DEV Community

Cover image for Selecting duplicate records
Michal Bryxí
Michal Bryxí

Posted on • Edited on

Selecting duplicate records

Selecting duplicate records might be useful in cases when you want to introduce new multi column index on existing data.


duplicates = ActiveRecord::Base.connection.execute('
  SELECT items.id, items.name
    FROM items
    WHERE items.id IN (
      SELECT
        id
        FROM (
          SELECT
          id,
          ROW_NUMBER() OVER w AS rnum
          FROM items
          WINDOW w AS (
            PARTITION BY entity_id, barcode
            ORDER BY id
            )
          ) t
          WHERE t.rnum > 1
       );
    ')
Enter fullscreen mode Exit fullscreen mode

Our exported data set will have id and name, so that we can tell which rows we want to delete. And it will only show rows where entity_id and barcode would not be unique. That's an example, yours will be different.

After the cleanup, when the query above would return an empty result set we should be able to create following index without any issues:

add_index(:items, [:entity_id, :barcode])
Enter fullscreen mode Exit fullscreen mode

Inspiration taken from the article Rails: migration: Adding a unique Index and deleting Duplicates


Image generated via Midjourney prompt: Heap of items where every object is represented twice; --ar 16:9

Top comments (1)

Collapse
 
nullvoxpopuli profile image
NullVoxPopuli

A comment, by request