SQL Duplicates

#sql #duplicates #removing #postgres

Disclaimer

This is a purely self serving article. I've had to do this a number of times and end up having to re-learn it all every time I've done it. SOOOOO this time I will drop some basics here as a reference for my failing memory but if it helps anybody else even better. It's not indented to be instructional or in anyway best practice so if you have issues or know of more appropriate ways please feel fee to drop things in the comments etc.

The Problem

The key problem being addressed is a number of duplicate records that have found their way into our datasets. These duplicates are matched on the master record id and the description used in the child example

id	parentId	description
1	P1	this is ok
2	P1	I'm here twice
3	P1	I'm here twice

Breaking it down

So the first thing I did was a query that will actualy help me find the rows I was looking for and the proof there are actual duplicates.

select "parentId",
       id,
       description,
       row_number()
       over (partition by "parentId", description order by "parentId", overview) as RN
from "Detail"

This produces results like these

id	parentId	description	RN
1	P1	this is ok	1
2	P1	I'm here twice	1
3	P1	I'm here twice	2

So we now have a row id/number based on the parentId and the description matching.

We then use this in the delete query to kill al the row numbers over 1

delete
from "Detail"
where id in (
    select id
    from (
             select "parentId",
                    id,
                    description,
                    row_number()
                    over (partition by "parentId", description order by "parentId", overview) as RN
             from "Detail"
         ) as F
    where RN > 1
);

This is probably a convoluted way to achieve this and I'm sure there will be more efficient way but it has worked for me in this occasion so I hope it helps others.

Running the previous query then results in

id	parentId	description	RN
1	P1	this is ok	1
2	P1	I'm here twice	1

DEV Community

SQL Duplicates

Disclaimer

The Problem

Breaking it down

Top comments (0)

Read next

Podman - Create A FHIR-Server & SQL Server

Pragma Autonomous Transaction in Oracle PL/SQL | Complete Overview

📝 30 Common PostgreSQL Interview Questions 🐘

Data Analyst Checklist