Bug #189
closedIMDB scraper - use HTTP compression (gzip)
0%
Description
Hi,
In attempt to make the scraping faster, I passed the traffic from XBMC4Xbox through a Fiddler session (http://www.fiddler2.com) and I found that the response wasn't compressed. Looking for some examples on the web I found that in fact gzip is supported and confirmed it by changing imdb.xml quickly to add the gzip option - apparently the support was in there for a while (ticket 17389).
Attached is a capture from original scraper, modified one (compressed) and a diff attempt - it reduces the traffic for some pages from 98K down to 22K! (see Transformer tab in Fiddler for compression info).
There might be more to do around "cache" option, as you can see from the captures if you open them in Fiddler it makes the query for the same URL a few times, I wonder if that can be cached locally instead of being downloaded again, but that might be a subject of another ticket.
Just wondering if we could modify the scraper to use HTTP compression, and whether maybe there was a reason for not using it in the first place.
Thanks,
Dan
Files