20091014

Coppermine duplicate records

If you´ve got duplicate records in Coppermine, but no duplicate files this article is for you. This query will find all duplicate records, and its easy to do a DELETE on the bad id's. On the least hits duplicate ofcourse, you wouldn't want to lose sweet historical data!

I use Coppermine to host a photogallery on veldtoertochtulicoten.nl and after uploading some 400 photos i found out that some where duplicate. Bugger, but easy to rectify i thought. Not! After removing one via the webinterface, i found out that the sourcefile gets deleted as well. No problem, except when your other duplicate is using the that same file!

There exist other tools, like this commandline PHP script by remdex found at the Coppermine forum. Nonetheless this was to much of a hassle so i´ve created a simple mysql query to find all duplicates:

Requirements
I wanted to know the id of the least visited duplicates, thus preserving the most popular/visited one each time. Its all done in one mysql query, but i left out the delete line. I've got Coppermine Photo Gallery 1.4.14 (stable).

The mysql query
SELECT * FROM (

SELECT a.pid, a.aid, a.filename, a.hits, a.filepath
FROM cpg14x_pictures AS a
RIGHT JOIN (

SELECT aid, filepath, filename, COUNT( filename ) AS cnt
FROM cpg14x_pictures AS b
GROUP BY filename
HAVING cnt >1
ORDER BY cnt

) AS c ON c.filename = a.filename AND c.aid = a.aid
ORDER BY a.aid, a.filename, a.hits DESC

) AS d
GROUP BY filename

If you don't understand the DELETE command (yet) you shouldn't run code this radical. I give no garanties on other databases, or coppermine installations, so validate its output before you insert the delete command and always make a backup first.

Every pass it only deletes one duplicate, so if you've got triplets or more duplicates you should run it twice or more.

PS i used phpMyAdmin, if only blogspot.com was so effective! There is no code button here, nor ident. :(

Update 20091103: Decided to put my comment inside this post to avoid blog migration issues;
How to add delete functionality:

Replace the (first) line "select * from (" with
"DELETE from cpg14x_pictures where pid IN (
select d.pid from ("

and add a ")" at the end of this query.

Remember to backup and test first without deleting!

1 comment:

  1. How to add delete functionality:

    Replace the (first) line "select * from (" with
    "DELETE from cpg14x_pictures where pid IN (
    select d.pid from ("

    and add a ")" at the end of this query.

    Remember to backup and test first without deleting!

    ReplyDelete