Extracting real estate data from some of the most popular real estate listing portals from the US. The data feed needed a daily push of the new listings in an easy-to-consume manner for the database ingestion. It was also entailed that the crawling infrastructure must support high volume data extraction going up to millions of listings on a weekly basis without compromising on the quality.
The required commercial real estate data points were:
The client delivered a list of source websites to be crawled. The data had to be extracted on a daily basis to collect fresh information every day. Our team set up crawlers to fetch the data points from the provided source sites. Since every website in the source list had a different structure and design, site specific crawl and extraction was the offering that suited this case.
Once we were done with the initial setup, the crawlers started delivering data. According to the client’s preference, the data was to be uploaded to the client’s Azure servers in XML format. After stabilizing the crawlers, PromptCloud was able to deliver millions of real estate listings on a weekly basis.
Being a leading real estate services and investment firm, the client wanted to build and maintain a database of the commercial real estate listings which would be used to power their business intelligence capability and improve service portfolio with better data-backed research.