Grettings Sr,
I have scrapped Google play store before.
I use ?gl={COUNTRY} query paramenter to get data by country. I figured it out in my previous google play scrapping jobs.
To get more than top 100 apps I simulate the scroll down movement and make an http post with the following http post parameters for example:
start=120&num=60&numChildren=0&cctcss=square-cover&cllayout=NORMAL&ipf=1&xhr=1
By changing and iterating numbers I can get more tops apps.
However for some countries and categories it does not reach top1500 apps.
Because of you need to get all apps from all categories and all countries. Retrieving all the information can take many hours by using a single thread.
PROPOSAL 1 (Single Thread)
- Create C#.NET web crawler to run with a single thread to scrape google play app info.
- The scraped data will be saved in a local database.
- If scrapping fails in certain point due to network failure it will remember the point where it was and will start scrapping again starting in the same point, not from the first app or categorie. Previous scrapped data wont be lost because it is in database.
- Each time it runs after 12 hours, it will scrap all again and will update information in database.
PROPOSAL 2 (Mutiple Threads)
- It will have the same proposal 1 features plus it will work multiple threads. It can work 200 or 300 times faster than proposal 1, depending of number of threads and server bandwith.
- It will use all server internet bandwith.