The client wanted to build an internal system for hotels price matching using the pricing data extracted from their competitors. Since the target websites had complex and dynamic coding elements and hundreds of thousands of hotel listings, the scraping hotels data demanded an extensive infrastructure and high-end resources. The client didn’t have the technical know-how to go about this and wanted a fully-managed service that can take end-to-end ownership of the data mining process. Another key requirement was that the data must be extracted at a frequency as high as twice a day, which again is resource-intensive.
The client shared the detailed requirements including the target sites, crawling frequency, their preferred data delivery format and the data points they wanted to extract from these sites. This use case comes under our site crawling service since the websites in the list had different structuring and design. The client needed the extracted data in JSON format and was ready to use the PromptCloud API to access the extracted data at their end.
As per their instructions, the different target sites had to be crawled at different frequencies, including twice a day, fortnightly and daily. Our team completed the web crawler set up for the three target sites in just five days and the initial set of data files was delivered to the client. About 2.5 million records were delivered to the client during the first webcrawl, solving their hotels price match issue.