Bug #258
closedimdb and movieposterdb scrapers no longer get movie posters / thumbnails
100%
Description
The imdb and movieposterdb scrapers no longer retrieve thumbnails, probably due to changes in those sites' html.
Files
Updated by foresto about 11 years ago
- File fixthumbscrapers.patch fixthumbscrapers.patch added
- File imdb.xml imdb.xml added
- File imdb.xml imdb.xml added
- File movieposterdb.xml movieposterdb.xml added
Investigation revealed that the URLs and regular expressions in both scrapers were broken. Additionally, the impawards scraper was downloading to the imdb thumbnail cache file, resulting in bad data sent to the GetIMDBThumbs regular expressions. I'm attaching a patch that fixes all these problems, along with whole copies of each patched file.
Updated by foresto about 11 years ago
Note: The movieposterdb scraper would be better if it only grabbed "Cover" and "Poster" images from that site, excluding "Logo" images. That's probably a job for a separate patch. My attached fix merely restores functionality; it doesn't make the scraper any smarter than it was before it broke.
Updated by foresto about 11 years ago
- File imdb.xml imdb.xml added
- File fixthumbscrapers.patch fixthumbscrapers.patch added
I'm attaching a new version of system/scrapers/video/common/imdb.xml and a new patch to replace the ones I attached earlier. This fixes imdb Plot Summary scraping.
Updated by buzz about 11 years ago
- Status changed from New to Feedback
- Assignee set to foresto
- Target version set to 3.3.3
- Resolution set to fixed
Applied in r32584. thanks for the patch. Is this based at all on upstream xbmc scraper regexps etc ? Although they have somewhat different scrapers many of the regexps are the same, and can be often used - and it also keeps us slightly more in sync, than if we go down a custom route etc, hence my question.
Updated by foresto about 11 years ago
Is this based at all on upstream xbmc scraper regexps etc ?
Nope.
I took a brief look at the current xbmc code, but the structure of its scraper code has changed quite a bit since xbmc4xbox forked. In the little free time I had, I didn't see what they are now doing for imdb scrapes.
My patch is therefore not based on upstream, although it probably doesn't matter at all, since it only touches regular expressions that will undoubtedly need replacing again next time imdb changes their site.
Updated by buzz about 11 years ago
true - and one thing is for sure - it will break! your help is appreciated thanks - I really dislike updating scraper xml :)
I have just updated the tmdb scraper which is mostly based on the xbmc one. As that is based on an official api, it should at least keep working for sometime.
We have a lot of broken scrapers currently distributed with xbmc4xbox, ones that are unlikely to be updated should probably be removed - perhaps something we could get users to go through on the forum to test/check ?
Updated by buzz about 11 years ago
- Status changed from Feedback to Closed
- Assignee changed from foresto to buzz
Updated by buzz about 11 years ago
to clarify my comment of "it will break" was of course referring to how often IMDB change their website around :) new xbmc4xbox release is out. thanks again for your help.
Updated by foresto about 11 years ago
buzz wrote:
We have a lot of broken scrapers currently distributed with xbmc4xbox, ones that are unlikely to be updated should probably be removed - perhaps something we could get users to go through on the forum to test/check ?
I don't know how many users are actively monitoring the forum to notice such a request, but I guess it wouldn't hurt to ask.
My only concern with removing broken scrapers is that I'd hate to make it excessively difficult to revive them if someone were inclined to fix some of them. I guess that's what version control is for. :)
Updated by foresto about 11 years ago
buzz wrote:
Applied in r32584. thanks for the patch.
That revision doesn't show any change to movieposterdb.xml. How did you fix the conflicting cache file name (which intermittently broke imdb scraping) without updating that file?
Updated by foresto about 11 years ago
- File fixthumbscrapers.patch fixthumbscrapers.patch added
Oh, now I see what's going on. The impawards cache file name conflict fix was indeed included in the 3.3.3 release.
However, it looks like the latest patch I submitted here didn't include my movieposterdb.xml fixes, which therefore didn't make it into the 3.3.3 release. Sorry about that. I'll update the patch here and open a new bug report against the 3.3.3 release.
In the mean time, anyone using 3.3.3 who wants the movieposterdb scraper to work again can grab the version attached to this bug report, and install it manually.