Project

General

Profile

Actions

Bug #258

closed

imdb and movieposterdb scrapers no longer get movie posters / thumbnails

Added by foresto over 10 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Scraper (Music/Video Metadata
Target version:
Start date:
11/10/2013
Due date:
% Done:

100%

Resolution:
fixed
Affected Version:

Description

The imdb and movieposterdb scrapers no longer retrieve thumbnails, probably due to changes in those sites' html.


Files

fixthumbscrapers.patch (4.73 KB) fixthumbscrapers.patch foresto, 11/10/2013 09:34 AM
imdb.xml (11.2 KB) imdb.xml patched system/scrapers/video/imdb.xml foresto, 11/10/2013 09:34 AM
imdb.xml (6.21 KB) imdb.xml patched system/scrapers/video/common/imdb.xml foresto, 11/10/2013 09:35 AM
movieposterdb.xml (725 Bytes) movieposterdb.xml patched system/scrapers/video/common/movieposterdb.xml foresto, 11/10/2013 09:35 AM
imdb.xml (6.22 KB) imdb.xml patched system/scrapers/video/common/imdb.xml (v2) foresto, 22/10/2013 06:37 AM
fixthumbscrapers.patch (3.77 KB) fixthumbscrapers.patch patch (v2) foresto, 22/10/2013 06:37 AM
fixthumbscrapers.patch (5.14 KB) fixthumbscrapers.patch patch (v3) foresto, 29/10/2013 04:03 AM

Updated by foresto over 10 years ago

Investigation revealed that the URLs and regular expressions in both scrapers were broken. Additionally, the impawards scraper was downloading to the imdb thumbnail cache file, resulting in bad data sent to the GetIMDBThumbs regular expressions. I'm attaching a patch that fixes all these problems, along with whole copies of each patched file.

Actions #2

Updated by foresto over 10 years ago

Note: The movieposterdb scraper would be better if it only grabbed "Cover" and "Poster" images from that site, excluding "Logo" images. That's probably a job for a separate patch. My attached fix merely restores functionality; it doesn't make the scraper any smarter than it was before it broke.

Updated by foresto over 10 years ago

I'm attaching a new version of system/scrapers/video/common/imdb.xml and a new patch to replace the ones I attached earlier. This fixes imdb Plot Summary scraping.

Actions #4

Updated by buzz over 10 years ago

  • Status changed from New to Feedback
  • Assignee set to foresto
  • Target version set to 3.3.3
  • Resolution set to fixed

Applied in r32584. thanks for the patch. Is this based at all on upstream xbmc scraper regexps etc ? Although they have somewhat different scrapers many of the regexps are the same, and can be often used - and it also keeps us slightly more in sync, than if we go down a custom route etc, hence my question.

Actions #5

Updated by buzz over 10 years ago

  • % Done changed from 0 to 100
Actions #6

Updated by foresto over 10 years ago

Is this based at all on upstream xbmc scraper regexps etc ?

Nope.

I took a brief look at the current xbmc code, but the structure of its scraper code has changed quite a bit since xbmc4xbox forked. In the little free time I had, I didn't see what they are now doing for imdb scrapes.

My patch is therefore not based on upstream, although it probably doesn't matter at all, since it only touches regular expressions that will undoubtedly need replacing again next time imdb changes their site.

Actions #7

Updated by buzz over 10 years ago

true - and one thing is for sure - it will break! your help is appreciated thanks - I really dislike updating scraper xml :)

I have just updated the tmdb scraper which is mostly based on the xbmc one. As that is based on an official api, it should at least keep working for sometime.

We have a lot of broken scrapers currently distributed with xbmc4xbox, ones that are unlikely to be updated should probably be removed - perhaps something we could get users to go through on the forum to test/check ?

Actions #8

Updated by buzz over 10 years ago

  • Status changed from Feedback to Closed
  • Assignee changed from foresto to buzz
Actions #9

Updated by buzz over 10 years ago

to clarify my comment of "it will break" was of course referring to how often IMDB change their website around :) new xbmc4xbox release is out. thanks again for your help.

Actions #10

Updated by foresto over 10 years ago

buzz wrote:

We have a lot of broken scrapers currently distributed with xbmc4xbox, ones that are unlikely to be updated should probably be removed - perhaps something we could get users to go through on the forum to test/check ?

I don't know how many users are actively monitoring the forum to notice such a request, but I guess it wouldn't hurt to ask.

My only concern with removing broken scrapers is that I'd hate to make it excessively difficult to revive them if someone were inclined to fix some of them. I guess that's what version control is for. :)

Actions #11

Updated by foresto over 10 years ago

buzz wrote:

Applied in r32584. thanks for the patch.

That revision doesn't show any change to movieposterdb.xml. How did you fix the conflicting cache file name (which intermittently broke imdb scraping) without updating that file?

Actions #12

Updated by foresto over 10 years ago

Oh, now I see what's going on. The impawards cache file name conflict fix was indeed included in the 3.3.3 release.

However, it looks like the latest patch I submitted here didn't include my movieposterdb.xml fixes, which therefore didn't make it into the 3.3.3 release. Sorry about that. I'll update the patch here and open a new bug report against the 3.3.3 release.

In the mean time, anyone using 3.3.3 who wants the movieposterdb scraper to work again can grab the version attached to this bug report, and install it manually.

Actions

Also available in: Atom PDF