Thursday, 23 April 2015

Extracting Live Cricket Scores

One thing every beginner should know about Python is that only with a few lines of code one can pull off pretty interesting stuff. 

When I meet people who want to start learning python, I typically ask them about what they are interested in apart from work/academics. I find some people crazy about gadgets, some people who are just too much into movies, some fitness freaks while some are interested in travelling. I believe everyone can be motivated to learn python (yes, it is that cool!)

Here is what worked for me when we were to build a python dev team with these guys, everyone was asked to write a small piece of code that would scrap some sort of data from the web:

- The gadget guy had to list the top 10 mobiles on flipkart.com
- The fitness freak had to list out the top 10 proteins' supplements on healthkart.com
- The movie maniac had to list top 10 movies on imdb.com
- The traveler had to list top 10 destinations in India on tripadvisor.com

They all came up with wonderful pieces of python code and were motivated to learn python further!


Let us take another similar example. Since the IPL season is on (and since I am a big Fan) in this post we shall see how one can print live scores using Python. The approach to solve this problem is quite similar. Here is what we need:

- A data source which maintains live cricket scores [www.espncricinfo.com]
- A way by which one can request for data from the selected source [python requests]
- A mechanism to extract only the required data [regular expressions]

If we think about it, the above three requirements are pretty generic and a simple python code can be written that would scrap (extract) the required information from the web. Below is the code that did the job for me:


Note: This was meant for a quick script that could easily do the job. For more complex web content parsing and scraping, consider using BeautifulSoup and Scrapy!

Keep Coding! :)