Continuous news feeds in near real-time
Use case: Low Latency Crawls
The Client: A media house with editorial strength
The Challenge: The client was a media company targeting a vast set of audience who like to read about everyday things ranging from politics, sports, celebrities, and the like. They were seeking a data acquisition engine that could not just collect all the relevant data they asked it to, but also did that within seconds of a news being published on one of their target sources on the web. An extremely powerful web crawler was the goal but building it required high-level expertise and meant some shift in focus from editorial.
The Solution: Keywords and the list of target sources provided by the client were fed into PromptCloud’s low-latency component. The pipeline was set up and extracted data was indexed along with a markup indicating the category it belonged to. Only the data API layer was exposed to the client using which they downloaded data every time it appeared on it, and used that to build content for their own portal.
- Complete data coverage and single point of lookout
- Automatic feeds arriving every time an article was published
- Zero data processing efforts at client’s end
- Zero manpower required at client’s end
Celebrity gossips from around the web
Use case: Hosted Indexing
The Client: An entertainment publisher focused on Hollywood celebrities with sites across countries
The Challenge: The client was into curating content from the web about celebrity likes, dislikes or what they otherwise do via their interviews, articles, blogs or Tweets. Since the content they were interested in publishing only had to revolve around few things and not all, they had a predefined list of around 500 keywords based on which their editors manually searched the web. This process was not scalable for obvious reasons and automation was required.
The Solution: PromptCloud used its core technology to crawl multiple sources at the same time and extract all meaningful information that matched the pre-defined set of keywords. This process ran everyday to curate all the content and got indexed using PromptCloud’s hosted indexing solution. Indexed data was then uploaded on an interface that the client’s editors could look at, and get the list of all URL’s valid for a particular celebrity. This story was then used to create content on the client’s website that catered to a huge audience interested in celebrity gossips.
- Editors’ efforts channelized towards creating content from provided data versus researching content
- 100% coverage and data uploaded at desired frequencies
- Dynamic list of celebrities, keywords and web sources
- 50% reduction in costs at client’s end