DEV Community

Cover image for Deleting Duplicates in SQL
Danwycliff Ndwiga
Danwycliff Ndwiga

Posted on

Deleting Duplicates in SQL

In SQL handlingn duplicate records is essential for maintaining data accuracy, optimizing query performance, and ensuring consistent results.This article explores some practicle techniques to identify and delete duplicate rows using SQL queries.

Delete Duplicates Using a Unique Identifier

Consider the above code block

DELETE FROM cars
WHERE id IN (
    SELECT MAX(id)
    FROM cars
    GROUP BY model, brand
    HAVING COUNT(1) > 1
);
Enter fullscreen mode Exit fullscreen mode

The above code selects the records from cars and in the above code we want to delete the record that are duplicate in the model and brand we use the id to identify the MAX(we can only get one max) and we delete the only max record

Delete Duplicates Using a Self-Join

In this approach, we use a self-join to identify and delete duplicate rows in the cars table, keeping only the row with the smallest id for each unique combination of model and brand

DELETE FROM cars
WHERE id IN (
    SELECT c2.id
    FROM cars c1
    JOIN cars c2 ON c1.model = c2.model
                 AND c1.brand = c2.brand
    WHERE c1.id < c2.id
);
Enter fullscreen mode Exit fullscreen mode

Delete Duplicates Using a Window Function

DELETE FROM cars
WHERE id IN (
    SELECT id
    FROM (
        SELECT id,
               ROW_NUMBER() OVER (PARTITION BY model, brand ORDER BY id) AS rn
        FROM cars
    ) AS x
    WHERE x.rn > 1
);
Enter fullscreen mode Exit fullscreen mode

In the above case in the inner subquery, the ROW_NUMBER() function assigns a unique row number to each row within each group of duplicates (defined by model and brand)

Using MIN function

delete from cars
where id not in ( select min(id)
                  from cars
                  group by model, brand);
Enter fullscreen mode Exit fullscreen mode

The inner subquery SELECT MIN(id) FROM cars GROUP BY model, brand finds the lowest id for each unique combination of model and brand. This ensures that only one record for each car model and brand pair is retained
The DELETE FROM cars WHERE id NOT IN (...) statement removes records with IDs that aren't the minimum for their model and brand group. Essentially, this keeps only the oldest record and removes duplicates.

Top comments (0)