Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!
Building web's largest review database by aggregating scattered reviews on hotels and destinations across multiple sources.
Social Travel Engine
The client was looking to build one of the largest review database by aggregating scattered reviews on hotels and destinations across multiple sources. They had tried few solutions around web page crawling but issues started creeping in as data scaled, given they needed new data regularly. Also the number of sources were increasingly exponentially on the web and so was the data. Additionally, they wanted reviews from all countries in all languages and the author profiles, images, etc. from the web pages and decided to scrape hotel data with PromptCloud.
All historical data from each source was extracted in parallel with incremental data as reviews were published. Data was de-duped before delivery so only new data got uploaded. Machine learning techniques were employed for adaptive crawling thereby crawling the more active pages more often than others. Site list was dynamically modified based on client requirements. Over 20 million structured records were delivered in a period of 2 months.
[contact-form-7 id=”5″ title=”Contact form 1″]