This article gives a quick way of deleting duplicate records from a given table, that doesn’t have a primary key.
Were you ever in need of something like, deleting duplicate rows when the table doesn’t have a primary key. If yes, read on!
This article
gives a quick way of deleting duplicate records from a given table, that
doesn’t have a primary key. Also, we will NOT use cursors, loops, temporary tables (global or local) but still accomplish the mission. Interesting! isn't it. Please read on!
First lets create a simple table with two columns and name the table as
‘DeleteDupsTable’.
Create Table DeleteDupsTable(BrandId int, SKU int)
Lets add a few records to the above table: A screen shot of the all the work can be found at the end of this article. Also, the SQL Queries are uploaded and is available along with this article.
Insert into DeleteDupsTable Values
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),
(3000,Cast(Rand()*100 as int)),
(3000,Cast(Rand()*100 as int)),
(3000,Cast(Rand()*100 as int)),
(3000,Cast(Rand()*100 as int))
Notice, that
we don’t need to write the insert statement for all the rows. It’s an
enhancement in SQL Server 2008 aka table value constructor.
Go ahead run
the above SQL.
At this point
in time we have a heap created in the database. A table without any keys is
called a ‘Heap’.
Lets run the
query below to see what have we got to ensure we have necessary data before we
go ahead and purge the duplicate records.
Select * from DeleteDupsTable
Ok necessary
setup is done.
Lets get into
the details of how we delete duplicate duplicate records.
Lets verify
how many records have we got per Brand. Go ahead and run the below query. This
would list number of records per Brand.
Select Brandid,COUNT(BrandId) from DeleteDupsTable group by Brandid
Ok, we have all the data ready out for purging.
Now the most
important step. I wonder if it’s ever possible to delete duplicate records
without a primary key! :). Well, even
I don’t have any idea on how to I proceed to meet this requirement.
However, am
posting a work around. This may not be the best possible solution, But,
helps us to meet an immediate requirement short term!
Ok no more
blah blah!. Lets get into the details!
The following
query generates a dynamic ‘DELETE’ statement for every brand in the table. The criteria is the
BrandId should have more than one record (possible qualifier for duplicate
deletes :) ). The delete statement would retain the latest record (the last
record in the list will be the latest one when we issue a select * from
DeleteDupsTable).
Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from DeleteDupsTable Where BrandId='+ CAST(BrandId As Varchar)
From
DeleteDupsTable Group by
Brandid having COUNT(BrandId)>1
After
executing the above statement, the output will be a set of delete statements.
Each delete statement would delete all records excluding the latest one for every brand that has got more than one record.
Copy the
output and paste it on the a new query window and press F5.
Boooom! All
Records deleted. Mission
accomplished.
As part of
testing, I deleted my critical data. How would I get it back or at least would
like to show what records were purged.
Well! Good
question.
Lets enhance
the above query. The enhancement is, the primary requirement stats as it is which is nothing but the dynamic delete queries will delete the
records. But, at the same time it outputs the records that were purged. with no extra select statements (interesting! is it?).
Meaning, you see the data on your screen that was deleted. This way, even if
you have lost the data, you could still insert it back using the output just
generated.
Go ahead and run the below query and notice the output this query generates:
Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from DeleteDupsTable output Deleted.* Where BrandId='+ CAST(BrandId As Varchar)
From
DeleteDupsTable Group
by Brandid having
COUNT(BrandId)>1
In the above
query, notice we have output Deleted.* clause. This clause does the trick of
displaying the records that were deleted.
Notice, in the following output, each delete statement tries to delete all records excluding one for every brand. The record being retained is the latest record for each of the brands.
Output:

Hope that helps! And happy programming :)
I will get back to you with another interesting article very soon!