I don't know how it happened, but to prevent it happening again, I wanted to add a unique constraint on the 'username' column.
Before I could do that, however, I had to delete the duplicate entries.
I searched around using Google, and found a few solutions that seemed complicated, and not quite what I wanted. They seemed to involve copying data to another table, deleting dupes, deleting the original table rows, and then copying it back.
So, here is what I think is an easier way to do it, that is a bit more flexible:
- Find the rows you want to keep
- Delete the other rows.
Selecting the good rows
My table in question is 'usersettings'. It has 9 columns, but all I care about are the first two, 'id', and 'username'.
To find the rows I want to keep, I am going to select the lowest id per username. To do this, I used aggregation, with the group by clause, and the 'min' operator.
select min(id) as id from usersettings group by username;
This returns the lowest id for all unique usernames. These are the rows I want to keep.
This could easily be modified to take the highest id, using max.
I am sure there are lots of other ways to implement the criteria to select the ids that you want to keep.
We can dump this result set into a temporary table, or even just use it as a subquery expression.
To maximize database compatibility, I will just use a subquery.
Deleting the bad rows
Okay, first we will just 'select' the rows to be deleted, to make sure we are deleting the proper rows:
select * from usersettings where id not in (select min(id) as id from usersettings group by username);
Sure enough, this returns a result set containing the duplicate rows that have higher ids.
So, now that we know we are selecting the correct data, I can go ahead and delete it:
delete from usersettings where id not in (select min(id) as id from usersettings group by username);
Using a temp table
If you want to use a temporary table instead, with postgres you could do the following:
select min(id) as id into temp table goodids from usersettings group by username;
delete from usersettings where id not in (select * from goodids);
As always, you should really back up your database before deleting stuff!!!
Now that my usersettings.username only contains unique data, I am able to apply the unique constraint to it.
No comments:
Post a Comment