The client wanted to monitor social media sites like Twitter and Instagram for any mentions of their brand and product names. In the event of finding a mention, they wanted to extract the post content along with the details like post URL, profile username, number of likes, comments, retweets and hashtags used.
The client provided us with the social media sites to be monitored and the list of keywords to look for. Twitter and Instagram were the social media platforms to be monitored. Since the requirement was brand monitoring, the sites had to be crawled in a frequency of daily. Our team programmed web crawlers to crawl and find instances of the keywords provided by the client and extract the required data points upon finding them. Since Twitter and Instagram have their own APIs, we used them for extracting the data. This particular use case comes under site specific crawl and extraction since the setup is specific to the site to be crawled. The client chose to get the data delivered in JSON format. The initial setup was complete within 3 days and the data started flowing in. As per the client’s preference, the data was directly being uploaded to their S3 servers. We started delivering all records of Twitter/Instagram posts with mentions of client’s brand and product names on a daily basis.