We would like to scrape czech real estate registry. [login to view URL]
There is katastrální území ( property zone ), we have list of katastrální území . LV (owners list)is number from 1-XXXX we don't know how mutch LV are in katastrální území, you have to make some protection which stop trying number of LV. In every katastrální území are different number of LV. Or save number of LV to DB and one per some time update DB.
We would like to scrape it every week. The code has to be very fast, becouse there are about 7 mil rows. The output should be json. There is some protection from scraping, you need to change ip about every 100 requests.
We need all information from LV (Owner’s list in czech). I prepared screenshoots. I hope it will be understandable.
When you find LV, there are information about [login to view URL] can find link to Pozemky(Properties) Stavby(Buildings) Jednotky(Flats)
In links there are more information about items on the LV. I screenshooted jednotky, stavby and pozemky.
Sometimes there is the red text with links. You can see it on screenshoot below. We need to open this links too.
Final delivery should be program which we can use and define service which we have to buy to change proxies. We prefer code in python.