Geographically-sparse Feeds Aggregation
The Client: Social media intelligence house
The Challenge: The Client was looking to get news feeds and social media data scattered in various geographic locations coming from more than 5000 sources. They wanted this data to be delivered in a structured collated format which they could simply import every week. Earlier they tried it in-house, but results were unsatisfactory as data was lacking in both quality and quantity. Client reported that geographic location associated with feed was incorrect in numerous cases. Moreover, they also wanted this data to be searchable for more than 1000 set of keywords and specific queries.
The Solution: We addressed this requirement by setting up a mass scale crawl, that enabled crawling numerous sources in parallel at regular periodic intervals in a day, still adhering to the politeness policies by not excessively hitting the servers of these sources. Feeds from various social media were aggregated intelligently by developing a Geo-Intelligence API, that assured feeds were captured only from desired locations. List of locations, sources, keywords and queries was dynamically modified based upon the client requirements and feedback. Over 2,00,000 feeds were collected from various continents within 2 months of time. Every week fresh data is collated location-wise and delivered.
- Parallel collection of data from numerous sources without any infrastructural concerns
- Uniform Data schema irrespective of number of sources and heterogeneity of content
- Periodic delivery of fresh data, reducing further data processing efforts
- Geo-Intelligence API assuring data to be belonging to the described geography
- Scalable solution with increasing number of sources, locations and keywords
Use case from Hosted Indexing
Twitter data for a set of keywords
The Client: A social listening platform for enterprises
The Challenge: The client wished to use its social media intelligence platform to run analysis on specific tweets. These tweets had to have a set of keywords or phrases and Twitter was to be monitored for the same. In addition, tweets were to be collected only from specific countries in Europe. Since infrastructure involvement was high and client’s core competency was not crawling big datasets, they wanted a facilitator for all this data.
The Solution: PromptCloud monitored tweets from specific countries using geo-location tagging capabilities and extracted tweets that matched any keyword or phrase from the list of 5000 keywords provided by the client. Extracted data was indexed on PromptCloud’s API so that client could query it using combinations of “AND”, “OR” and “NOT”. The queries returned results in JSON format that the client could consume for further use.
- Dynamic lists of keywords that could be modified anytime
- Continuous feed of relevant tweets as they appeared
- Single line queries could be performed like “Honda and sports” OR “Hugo Boss and NOT fragrance” (only examples)
- Ready-made data alleviating all tech-aches