While data scraping is quite challenging in itself, we do reflect on how opinion mining can help our enterprise clients better. Opinion mining, better known as Sentiment Analysis deals with automatic scanning of text and establishing its nature or purpose. Fundamentally, it is important to determine if text scraped and extracted from a website is useful or not; or even whether it relates with the subject that is mentioned in the title.
The function of sentiment analysis can be to analyse entries (user reviews, product feedback, service feedback forms etc.) and indicate feelings expressed (happiness, dissatisfaction etc.). On a simple scale, this can be achieved by establishing a scoring system from 1 – 10 with 10 being most positive (or such similar measure) where each word is generally associated with an emotion. The score of each word, and whole text, is then calculated to see what the opinion/ sentiment indicated.
Another methodology is subjectivity/objectivity identification. Here, extracted data is tested for being subjective or objective. However, this may prove to be difficult since results of estimation are person-specific (or subjective).
Perhaps the most refined kind is the ‘feature-based sentiment analysis’. Here, individual opinions of users are extracted from text regarding a certain product or service and then evaluated to see if the consumer is satisfied or not. This is where PromptCloud’s mass-scale crawling solution helps. For example, if you wish to crawl hundreds of thousands of blogs, news, or forum sites to extract very high-level information like article URL, date, title, author and content, mass-scale crawls will provide this data in a structured format as continuous feeds.
We could also filter these crawls based on a list of keywords to facilitate better sentiment analysis based on subject topic, language and even keyword detection. Our named-entity recognition service only helps to enrich this information.
We helped a client with sentiment analysis for a product. The client wanted to capture comments about it from forums and Web sites, from retailers and distributors to enthusiasts to the average consumer. The client’s use case was to get data so as to understand how favourable users found a product, and what consumers talked about it on the Internet. Competitive analysis was another scenario to study as well. While Twitter provided a very clear picture, it wasn’t going to be help our client with the breadth of insights desired.
Considering that there are hundreds of websites that may include product reviews and numerous online forums focused on consumer durables and/ or related topics you have a valuable collection of insights.We set-up crawls to extract reviews from a select highly valued sites with hundreds of URLs automatically.
Our automated web data extraction and monitoring solution targeted sites and delivered precise results. Moreover with normalizations in place, we delivered analysis-ready structured data.