Deleting Duplicate Rows when the table doesn’t have a primary key.

Deeraj
Posted by in Sql Server category on for Beginner level | Views : 13452 red flag
Rating: 4 out of 5  
 1 vote(s)

This article gives a quick way of deleting duplicate records from a given table, that doesn’t have a primary key.


 Download source code for Deleting Duplicate Rows when the table doesn’t have a primary key.

 

Were you ever in need of something like, deleting duplicate rows when the table doesn’t have a primary key. If yes, read on!


This article gives a quick way of deleting duplicate records from a given table, that doesn’t have a primary key. Also, we will NOT use cursors, loops, temporary tables (global or local) but still accomplish the mission. Interesting! isn't it. Please read on!

 

First lets create a simple table with two columns and name the table as ‘DeleteDupsTable’.

 

Create Table DeleteDupsTable(BrandId int, SKU int)

 

Lets add a few records to the above table: A screen shot of the all the work can be found at the end of this article. Also, the SQL Queries are uploaded and is available along with this article.

 

Insert into DeleteDupsTable Values
(1000,Cast(Rand()*100 as int)),
(
1000,Cast(Rand()*100 as int)),
(
1000,Cast(Rand()*100 as int)),
(
1000,Cast(Rand()*100 as int)),
(
1000,Cast(Rand()*100 as int)),
(
1000,Cast(Rand()*100 as int)),
(
1000,Cast(Rand()*100 as int)),
(
1000,Cast(Rand()*100 as int)),
(
2000,Cast(Rand()*100 as int)),
(
2000,Cast(Rand()*100 as int)),
(
2000,Cast(Rand()*100 as int)),
(
2000,Cast(Rand()*100 as int)),
(
2000,Cast(Rand()*100 as int)),
(
3000,Cast(Rand()*100 as int)),
(
3000,Cast(Rand()*100 as int)),
(
3000,Cast(Rand()*100 as int)),
(
3000,Cast(Rand()*100 as int))

 

Notice, that we don’t need to write the insert statement for all the rows. It’s an enhancement in SQL Server 2008 aka table value constructor.

 

Go ahead run the above SQL.

 

At this point in time we have a heap created in the database. A table without any keys is called a ‘Heap’.

 

Lets run the query below to see what have we got to ensure we have necessary data before we go ahead and purge the duplicate records.

 

Select * from DeleteDupsTable

 

Ok necessary setup is done.

 

Lets get into the details of how we delete duplicate duplicate records.

 

Lets verify how many records have we got per Brand. Go ahead and run the below query. This would list number of records per Brand.

 

Select Brandid,COUNT(BrandId) from DeleteDupsTable group by Brandid

 

Ok, we have all the data ready out for purging.


Now the most important step. I wonder if it’s ever possible to delete duplicate records without a primary key! :). Well, even I don’t have any idea on how to I proceed to meet this requirement.

 

However, am posting a work around. This may not be the best possible solution, But, helps us to meet an immediate requirement short term!

 

Ok no more blah blah!. Lets get into the details!

 

The following query generates a dynamic ‘DELETE’ statement for every brand in the table. The criteria is the BrandId should have more than one record (possible qualifier for duplicate deletes :) ). The delete statement would retain the latest record (the last record in the list will be the latest one when we issue a select * from DeleteDupsTable).

 

 

Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from DeleteDupsTable Where BrandId='+ CAST(BrandId As Varchar)
From

DeleteDupsTable Group by
Brandid having COUNT(BrandId)>1

 

After executing the above statement, the output will be a set of delete statements. Each delete statement would delete all records excluding the latest one for every brand that has got more than one record.

 

Copy the output and paste it on the a new query window and press F5.

 

Boooom! All Records deleted. Mission accomplished.

 

As part of testing, I deleted my critical data. How would I get it back or at least would like to show what records were purged.

 

Well! Good question.

 

Lets enhance the above query. The enhancement is, the primary requirement stats as it is  which is nothing but the dynamic delete queries will delete the records. But, at the same time it outputs the records that were purged. with no extra select statements (interesting! is it?). Meaning, you see the data on your screen that was deleted. This way, even if you have lost the data, you could still insert it back using the output just generated.


Go ahead and run the below query and notice the output this query generates:


Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from DeleteDupsTable output Deleted.* Where BrandId='+ CAST(BrandId As Varchar)
From

DeleteDupsTable  Group
by Brandid having
COUNT(BrandId)>1

In the above query, notice we have output Deleted.* clause. This clause does the trick of displaying the records that were deleted.


Notice, in the following output, each delete statement tries to delete all records excluding one for every brand. The record being retained is the latest record for each of the brands.


Output:

Output

 


Hope that helps! And happy programming :)


I will get back to you with another interesting article very soon!

Page copy protected against web site content infringement by Copyscape

About the Author

Deeraj
Full Name: Deeraj Chakravarthy
Member Level: Starter
Member Status: Member
Member Since: 10/29/2007 1:00:04 AM
Country: India


Qualification: BCA, MScIS, MDSE, PMI-ACP

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)