Remake a scraper, make a new one and make a working Database
$30-250 USD
Pagato alla consegna
We want to highlight buildings that have apartments through Mapbox. Each of these highlighted buildings should have information about the apartments in it (the one we get from our working scraper which we should remake by adding some new fields). Mapbox will highlight apartments via a certain parameter, in our case this parameter is av = 0 and av = 1 (av - availability), where av = 0 - there are no apartments in the house, it is not highlighted it and av = 1 where there are apartments should be highlighted.
What we need:
A database (we think of SQL, but there may be better options), which will store the following data: coordinates of the house (in which the apartment is), address, links to the apartment on [login to view URL], description of the apartment (need to be added to the scraper), price, area, number of bathrooms and bedrooms, links to images.
We have one big .geojson file with all the buildings in New York City. It serves as the system layer in Mapbox. This file contains parameters such as PID (polygon id), height (for 3d buildings), coordinates of the polygon and the parameter av = 0 (because buildings are not highlighted).
We have a working scraper. It should scrape the necessary information and output it in the format .geojson, with the addition of the parameter av = 1 (because such buildings should be highlighted). The keys in the scraper should be instead of Number of bedrooms key changed to Number_of_bedrooms. When this file is ready, it should be uploaded to the database. In the database it is necessary to synchronize two .geojson (system and scraped one), that is, fill apartments info in the empty rows of the corresponding coordinates, and therefore it is necessary to check whether a certain apartment is not already in this base (the base should be updated every 4 hours, it is necessary to check coordinates and other info whether this apartment is in the DB. As the DB already has a system layer, which contains information about the polygons (ie the buildings themselves) you should not forget that the scraped info doesn't contain polygons info, it carries only the coordinate of the building, so you need to check which of the polygons is the coordinate of the scraped apartment and add the information we need to the database. There are two options: inside() function or others which allow us to check if a certain coordinate is in a certain polygon (in this case it is necessary to check in our system .geojson, there are all polygons) and the second option: our own table "polygon-coordinates": each polygon corresponds to certain [login to view URL] on this table should be faster than programmatically comparing coordinates with all polygons each time (and the .GeoJSON system contains more than 1 million buildings and weighs 400+ mb). We have a csv file that has list of addresses and 700k+ coordinates. So you should make a small scraper with the function inside () or analogue and output it all in one base, in which each coordinate corresponds to a specific polygon. There may be cases when not all the coordinates of all the NYC buildings are in the csv file, which means that when the such data arrives, it must be checked first in the "polygon-coordinates" table and then (if there are no matches), check them separately on the function inside () or analogues. When it's over, the database should be like this: there are many polygons (system layer + those highlighted). The system layer polygons have av=0 parameter, the highlighted ones have av = 1 and information. The database then needs to export it all to a new .geojson file. This file will be updated every 4 hours as scraping will take place every 4 hours. Therefore, it is important to keep in mind that you should constantly check whether a particular apartment is already in the database (starting with the coordinates, description and ending with the price, as there is often a price update or 5 apartments in the same house).
Each polygon corresponds to only one building.
Rif. progetto: #20963041
Info sul progetto
16 freelance hanno fatto un'offerta media di $196 per questo lavoro
Hello, I have Experience with scraping Websites like FB ,IG and Telegram and Experience in Google APIs. I have 5 years of experience with Python. I have worked with many libraries in python for tasks such as Data Analy Altro
Hello\nI am a python developer.\nI have great experience in web scraping and I am an expert in it.\nI have all necessary skills by which I can scrape any website.\nPlease message me to discuss in detail.
Greetings I have 3 years of experience in python, web scraping. I have scraped more than 100 websites. I can help you scrape this website . I have looked at this website. Ping me for further discussion
Hello, I am an experienced developer and coder with very good experience in data scraping. I can make the scraper exactly as per your requirements. As the data set contain a large amount of data using SQL type databa Altro
Hi, I already created the scraper which will scrap all these details from trulia site. Kindly message me if you need this. Thanks