Project

General

Profile

Actions

Bug #189

closed

IMDB scraper - use HTTP compression (gzip)

Added by dandar3 over 12 years ago. Updated almost 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Scraper (Music/Video Metadata
Target version:
Start date:
Due date:
% Done:

0%

Resolution:
fixed
Affected Version:

Description

Hi,

In attempt to make the scraping faster, I passed the traffic from XBMC4Xbox through a Fiddler session (http://www.fiddler2.com) and I found that the response wasn't compressed. Looking for some examples on the web I found that in fact gzip is supported and confirmed it by changing imdb.xml quickly to add the gzip option - apparently the support was in there for a while (ticket 17389).

Attached is a capture from original scraper, modified one (compressed) and a diff attempt - it reduces the traffic for some pages from 98K down to 22K! (see Transformer tab in Fiddler for compression info).

There might be more to do around "cache" option, as you can see from the captures if you open them in Fiddler it makes the query for the same URL a few times, I wonder if that can be cached locally instead of being downloaded again, but that might be a subject of another ticket.

Just wondering if we could modify the scraper to use HTTP compression, and whether maybe there was a reason for not using it in the first place.

Thanks,
Dan


Files

Actions #1

Updated by buzz over 12 years ago

was the scraper quicker with your changes ?

Actions #3

Updated by buzz almost 12 years ago

  • Status changed from New to Closed
  • Assignee set to dandar3
  • Target version set to 3.2
  • Resolution set to fixed
Actions

Also available in: Atom PDF