The client wanted cars data to be extracted from automobile websites including classified sites, auto spare parts sites, automobile blogs, forums etc. The required data points were product id, post title, location, seller details, make and model, description and prices.
The client provided the list of source websites to be crawled. The data had to be extracted on a daily basis meaning fresh information had to be supplied every day. Our team set up crawlers to fetch the data points from the provided source sites. Since every website in the source list had a different structure and design, site specific crawl and extraction was the offering that suited this case. Once we were done with the initial setup, the crawlers started delivering data. As preferred by the client, the data was to be uploaded to the client’s S3 servers in CSV format. Once the setup was complete, our crawlers started delivering about 300 k records on a daily basis