Asked By – Joseph
I have recently been learning Python and am dipping my hand into building a web-scraper. It’s nothing fancy at all; its only purpose is to get the data off of a betting website and have this data put into Excel.
Most of the issues are solvable and I’m having a good little mess around. However I’m hitting a massive hurdle over one issue. If a site loads a table of horses and lists current betting prices this information is not in any source file. The clue is that this data is live sometimes, with the numbers being updated obviously from some remote server. The HTML on my PC simply has a hole where their servers are pushing through all the interesting data that I need.
Now my experience with dynamic web content is low, so this thing is something I’m having trouble getting my head around.
The scraper is simply a odds comparison engine. Some sites have APIs but I need this for those that don’t. I’m using the scrapy library with Python 2.7
I do apologize if this question is too open-ended. In short, my question is: how can scrapy be used to scrape this dynamic data so that I can use it? So that I can scrape this betting odds data in real-time?
Now we will see solution for issue: Can scrapy be used to scrape dynamic content from websites that are using AJAX?
Webkit based browsers (like Google Chrome or Safari) has built-in developer tools. In Chrome you can open it
Menu->Tools->Developer Tools. The
Network tab allows you to see all information about every request and response:
In the bottom of the picture you can see that I’ve filtered request down to
Tip: log is cleared every time you load a page, at the bottom of the picture, the black dot button will preserve log.
Firefox has similar extension, it is called firebug. Some will argue that firebug is even more powerful but I like the simplicity of webkit.