For example, if you wish to crawl hundreds of thousands of blogs, news, or forum sites to extract very high-level information like article URL, date, title, author and content, mass-scale crawls will provide this data in a structured format as continuous feeds. Combine it with our low latency component, and you have all data at your disposal in near real-time. You could then ask us to filter these crawls based on a list of keywords and also have us index all this data for you to make it searchable via our hosted indexing offering.
Similarly, if you’re interested in meta information from a number of product sites without bothering about the product-level details, mass-scale crawls are for you. As part of this offering, we could also help you find which links/domains are live and which have been parked or gone stale. Irrespective of your use case, all data gets delivered in a structured format as per the schema and frequency that you desire.
Geo-Specific-data:Continuous feeds from social media from >5000 sources every few minutes
Finance Data in Real Time: News, blog and article feeds delivered continuously for signaling investment options
Data for Media House: News feeds aggregated from various sources based on keywords for online media