A popular media website from Brazil looking for content extraction and mining service.
The client wanted content to be extracted on a continuous basis from Brazilian news sites to power their news portal. The list of websites included popular blogs, news sites, forums and a few content bookmarking sites. The required data points were date of publishing, author name, title, main text content and tags.
Benefits to the client:
- Our team handled every technical aspect of the crawling process
- The initial setup only took 3 days to get completed and the supply of data was consistent
- Since monitoring was set up for each site, the quality and consistency of data was top notch
- Although some of the sites used dynamic coding practices, our tech stack could handle them well
- The client could launch their news portal with the data in a short notice
- The cost incurred was way less than what an in-house crawling set up would have costed them