Remake a scraper, make a new one and make a working Database

Chiuso Pubblicato 4 anni fa Pagato alla consegna
Chiuso Pagato alla consegna

We want to highlight buildings that have apartments through Mapbox. Each of these highlighted buildings should have information about the apartments in it (the one we get from our working scraper which we should remake by adding some new fields). Mapbox will highlight apartments via a certain parameter, in our case this parameter is av = 0 and av = 1 (av - availability), where av = 0 - there are no apartments in the house, it is not highlighted it and av = 1 where there are apartments should be highlighted.

What we need:

A database (we think of SQL, but there may be better options), which will store the following data: coordinates of the house (in which the apartment is), address, links to the apartment on [login to view URL], description of the apartment (need to be added to the scraper), price, area, number of bathrooms and bedrooms, links to images.

We have one big .geojson file with all the buildings in New York City. It serves as the system layer in Mapbox. This file contains parameters such as PID (polygon id), height (for 3d buildings), coordinates of the polygon and the parameter av = 0 (because buildings are not highlighted).

We have a working scraper. It should scrape the necessary information and output it in the format .geojson, with the addition of the parameter av = 1 (because such buildings should be highlighted). The keys in the scraper should be instead of Number of bedrooms key changed to Number_of_bedrooms. When this file is ready, it should be uploaded to the database. In the database it is necessary to synchronize two .geojson (system and scraped one), that is, fill apartments info in the empty rows of the corresponding coordinates, and therefore it is necessary to check whether a certain apartment is not already in this base (the base should be updated every 4 hours, it is necessary to check coordinates and other info whether this apartment is in the DB. As the DB already has a system layer, which contains information about the polygons (ie the buildings themselves) you should not forget that the scraped info doesn't contain polygons info, it carries only the coordinate of the building, so you need to check which of the polygons is the coordinate of the scraped apartment and add the information we need to the database. There are two options: inside() function or others which allow us to check if a certain coordinate is in a certain polygon (in this case it is necessary to check in our system .geojson, there are all polygons) and the second option: our own table "polygon-coordinates": each polygon corresponds to certain [login to view URL] on this table should be faster than programmatically comparing coordinates with all polygons each time (and the .GeoJSON system contains more than 1 million buildings and weighs 400+ mb). We have a csv file that has list of addresses and 700k+ coordinates. So you should make a small scraper with the function inside () or analogue and output it all in one base, in which each coordinate corresponds to a specific polygon. There may be cases when not all the coordinates of all the NYC buildings are in the csv file, which means that when the such data arrives, it must be checked first in the "polygon-coordinates" table and then (if there are no matches), check them separately on the function inside () or analogues. When it's over, the database should be like this: there are many polygons (system layer + those highlighted). The system layer polygons have av=0 parameter, the highlighted ones have av = 1 and information. The database then needs to export it all to a new .geojson file. This file will be updated every 4 hours as scraping will take place every 4 hours. Therefore, it is important to keep in mind that you should constantly check whether a particular apartment is already in the database (starting with the coordinates, description and ending with the price, as there is often a price update or 5 apartments in the same house).

Each polygon corresponds to only one building.

Web Scraping Sviluppo del Database API

Rif. progetto: #20963041

Info sul progetto

16 proposte Progetto a distanza Attivo 4 anni fa

16 freelance hanno fatto un'offerta media di $196 per questo lavoro

zekovicm

Hi there,I am Web Scraping expert from Bosnia & Herzegovina,Europe. I have carefully gone through with your requirements and I would like to help you with this project ! I can start immediately and finish it within the Altro

$450 USD in 7 giorni
(122 valutazioni)
7.4
p4logics

Dear Sir, I am interested in your project. I have gone through your requirement. I'm expert in web scrapping and web automation using selenium and jsoup, data management, data mining. I assure, I will do my best to w Altro

$200 USD in 7 giorni
(29 valutazioni)
5.5
zeke

I have lots of experience writing web automation scripts. Available to start immediately and finish as soon as possible. Please contact to discuss details if you are interested. Looking forward to work on this project. Altro

$140 USD in 7 giorni
(30 valutazioni)
5.6
ferozstk

Hello, After reading your project details I believe I'm suitable for this project. As I'm expert on it with more than 7 years experience. Please feel free to contact me. I am looking forward to hear from you. Altro

$70 USD in 4 giorni
(25 valutazioni)
5.0
fahdlyousfi

Hello, I have Experience with scraping Websites like FB ,IG and Telegram and Experience in Google APIs. I have 5 years of experience with Python. I have worked with many libraries in python for tasks such as Data Analy Altro

$60 USD in 3 giorni
(14 valutazioni)
4.3
DarkKnight2206

Hello\nI am a python developer.\nI have great experience in web scraping and I am an expert in it.\nI have all necessary skills by which I can scrape any website.\nPlease message me to discuss in detail.

$140 USD in 2 giorni
(17 valutazioni)
4.8
gourav845

Greetings I have 3 years of experience in python, web scraping. I have scraped more than 100 websites. I can help you scrape this website . I have looked at this website. Ping me for further discussion

$140 USD in 7 giorni
(5 valutazioni)
3.8
deco017

hello sir, i worked on several scrapping projects, i read all the description you write, and i understand what you want, i just need to discuss with you some details. i am available, ready and highly motivated to work Altro

$140 USD in 4 giorni
(2 valutazioni)
2.8
mayanktech9

Hello, I am an experienced developer and coder with very good experience in data scraping. I can make the scraper exactly as per your requirements. As the data set contain a large amount of data using SQL type databa Altro

$300 USD in 20 giorni
(2 valutazioni)
2.1
sajez

Greetings! My name is Daniel and I am a software developer from Germany, specialized in python applications. I have finished many web scraping projects and after looking at your attached flowchart, I am quite confident Altro

$250 USD in 7 giorni
(0 valutazioni)
0.0
araza754

Hi, I already created the scraper which will scrap all these details from trulia site. Kindly message me if you need this. Thanks

$133 USD in 1 giorno
(0 valutazioni)
0.0
tsft

Dear Sir or Madam: I am doing crawling, scrapying and ETL for a long time, many proyects.I have also worked with geolocation API of Google. According to your description a succesful system could be achieved within a m Altro

$500 USD in 30 giorni
(1 Recensione)
0.0