Finding duplicate episodes

Dec 9, 2014 at 4:36 AM
As I begin to run out of disk space, I'm on the hunt for space savings before starting to wipe parts of my precious collection...

I've noticed from time to time that hidden among my 15,000 episodes are some duplicates. I'd like to seek these out. I was hoping that since MC is so aware of what episodes I have or don't have, it might also be able to identify those duplicates.

Is that feature hiding in some corner somewhere? Or can you suggest some other tool that might be up to the task? I'm drawing a blank.

Thanks for your continued help!
Dec 9, 2014 at 4:47 AM
We don't have anything at the moment that would look for duplicate episodes um it could be possible to program something but might take a little bit of time.

i don't really know of any other program that would be able to search off the top of my head.

I'll see if I can whip something together...have some thoughts on this.
Dec 9, 2014 at 6:03 AM

MC would certainly be uniquely placed to solve this sort of problem. The tool that does the job needs to be able to identify episodes while ignoring differing filenames. It also needs to span all the drives the library might be spread over.

I can't imagine I'm the only person that needs this sort of solution...

Dec 9, 2014 at 6:16 AM
Yes, had a bit of a play, but had to reverse my work, as another Dev pushed out a update that I needed to merge.

Might not happen tonight, but I'll add some function to list duplicate episodes.
I'll match by Season and episode numbers, not by episode title, then thinking of a Log window, listing the show, and what duplicate season/episode was found.

Dec 9, 2014 at 7:40 AM
I've uploaded a test build to my drop-box, here. to check for duplicates, go to TV Shows menu, and click Check For Duplicate Episodes

Should give a fair indication of duplicate episodes.
Dec 9, 2014 at 12:23 PM

That sounds about perfect. Take your time :-)

Dec 9, 2014 at 4:03 PM
You can click on the blue here word to download the test build. Or click this

Media Companion 3.610c.exe
Dec 9, 2014 at 5:33 PM

Sorry, I missed the middle email for some reason. I'll give it a shot later tonight. Quick turnaround!

Dec 10, 2014 at 2:36 AM
I'm not getting any love...

When I check for duplicates I get the error: "Object reference not set to an instance of an object. Continue?"

Clicking "Yes" didn't yield any results.

I hope I didn't muck anything up by moving my settings folder from the previous version of MC (v3.604b)...

Any ideas?

Dec 10, 2014 at 2:39 AM
I'm not sure. I didn't run it over a large collection, plus, if you have custom shows, that could cause a problem with that script.

Zip up your tvcache.xml and config.xml and put them somewhere I can download them. I'll have a look in a few hours, when I get home, and try figure out what's happening.

It was a quick Slap together...
Dec 10, 2014 at 3:13 AM
414k - I can't just attach it?

Dec 10, 2014 at 3:28 AM
Invitation to Google Drive file sent...

Dec 10, 2014 at 4:16 AM
Yep, I see why it's Crashing with that error.
You have a number of shows without TVDB ID number, and MC uses this to keep episodes allocated to the correct Tv Series.
  <tvshow NfoPath="R:\Media\Series\Robot Wars UK\tvshow.nfo">
    <title>Robot Wars UK</title>
What I would do is in Media Companion, go to Tv Table view, and sort by TVDBID column.
Fix up those shows with TVDB Id's, and then do a Refresh all.

Also, take note of the notice blurb on the Download of Media Companion. You need to do a Batch Rescrape to allocate each episode's <uniqueid> tag.

Note: <uniqueid> - This new field is stored in each episode's nfo file.
If upgrading from before Media Companion 3.606b, or not already done this step, please do the following:
Users will have to use Tv Show's Batch Rescrape Wizard, to update existing episode nfo's with this entry.
To do this, run the Batch Rescrape Wizard, and select the episode's Rating to be rescraped for every episode in every Tv Show.
This will add the <uniqueid> field into the episode nfo's, updating this data for use with Trakt (when scraped into XBMC), and in the future of MC, will be used for Missing Episode code.
Dec 10, 2014 at 4:29 AM
Those shows don't exist in TVDB. Should I put in any old number? Or perhaps just remove these shows from MC?

Dec 10, 2014 at 5:26 AM
I removed those blank ID shows for now. But I'm sure they will creep back in later. I'm not sure what to do about them. Maybe I should be creating records for them in TVDB....

Your duplicates functionality worked a treat! I was able to clean up a ton (still working on it actually). It was able to find multiple instances of duplicates and span across drives, etc. It turned out to also be a great way to eliminate those irksome sample files that always show up in XBMC. It also helped to find some shows that had been mislabeled as duplicate episodes because of poor filename formats. (A missing episode search would probably have yielded something similar except that I would have gotten a ton of extra real missing episodes as well.)

Thanks for this!

Dec 10, 2014 at 5:37 AM
No Problem.

I'll put a test in place for Show ID and get the routine to skip any episode without an ID.

What to do about those custom shows... Well, I have been meaning to get back to custom shows, but I noticed you have them in a separate directory than other series.
Maybe set up a different profile, pointing to the custom Root Folder.
That way you'll have then scanned, but your main profile will be for TVDB series stuff??? Just a thought.

Glad I could be of help.
Dec 10, 2014 at 5:52 AM
Good suggestion about separating the library into different folders. Having the custom shows in different folders wasn't really intentional. That just happens to be the folder that houses my recordings coming in over the air. I needed them separated out so I could clean up daily broadcasts I don't want to keep or compare them to the same shows coming in from newsgroups, etc.

Dec 10, 2014 at 6:12 AM
One small problem with the duplicate search I noticed is that it doesn't track which file it has already reported. What this means is that when you have a duplicate episode, it reports the two files when the first one is found and then reports them again (in exactly the same way) when the second file is found. So my report is twice as long as it needs to be. But this problem is much more noticeable when I have a season that is mislabelled...

I've had a few seasons now where the filename format was so far off that all 20 episodes in a season are recognized as just one episode - or an episode with 19 duplicates. Since the report repeats the entry for each file (20 x 20), I end up with a 400 entry report instead of just a 20 entry report.

Dec 10, 2014 at 6:27 AM
He he...figured that might happen...well, did warn you I'd slapped it together.

Was just having a little play in that code, as it now identifies Show's without TVDB Id, and episodes without the show's ID, and also if missing the UniqueID.

I figure I need to do a report function for Media Companion...but for now, this is a stepping stone. Please, take it with a grain of patience. Least it found files you need to tidy up (even if you get told 20 times....LOL)
Dec 10, 2014 at 6:36 AM

:-) I'm not complaining at all. It has served a valuable function. I figured you would just want to know. I understand the extra effort it would take to track the reported files in memory, etc. At the end of the day it may not be worth the extra effort to clean things up.

BTW, I've already cleaned up 100GB and counting. The culprit seems to be NZBDrone. I think it is bringing in better quality files without cleaning up the older file - but I also know that sometimes it does clean up...

Dec 10, 2014 at 6:49 AM
Actually, this sort of function I believe will be very handy.
I just used it myself, and it advised me of 5 show's that had episodes without unique Id's. And I'd thought I'd gone through them all.

Anyway, will have to leave it there for now. Have to think how best to report all the findings...