Data Aggregation for Twitter Sentiment Analysis
Microblogging is one of the most popular forms of communication on web. Twitter is responsible for popularization of Microblogging. Twitter posts are popularly known as tweets and they have a limited length of 140 characters. Broadly, Twitter users can be categorized into two categories: those who post about themselves and those who share third party information. Most of the users express their emotions on various issues in the form of tweets. These tweets convey a great deal about the mood or the sentiment of the users, at a macroscopic level. Even though, these tweets may not express emotions distinctly but they are somehow noticeable because of the language, word selection, and emotions used.
In the present scenario, sentiment analysis technique is one of the ways to predict opinion of a community. The collection of tweets over a given time period reveal transformation in the public mood at large. An empirical analysis of sentiment is being carried based on the blogs, reviews, tweets, facebook posts and other posts on microblogging and social networking sites. These posts are extracted using information retrieval techniques and combined using different aggregation rules. However, these may prove inaccurate when different, possibly correlated items are under consideration.
Data aggregation offers a variety of tools and definitions that prove to be beneficial in the formulation of collective sentiment. The field of computational social choice related with computational properties for collective choice in Artificial Intelligence and multi-agent systems is extremely relevant.
There are different preference aggregation methods available to choose from. Collective Sentiment Analysis are tested for different scenarios predicting real-world events including launching of a product or elections. Machine learning techniques are being used to learn the best aggregation method. Voting Theory is the one that has proven to be the most accurate. On the other hand, classical voting theory comprising axiomatic properties as well as results about the computational complexity of aggregation rules display the choice of aggregation methods over others.
Collective Sentiment analysis can be carried out when sentiments are analyzed collectively about a single topic, either through polarity like positive, negative or neutral or through “5-stars” scale Sentiments can be identified as positive, negative, or neutral. Some of the complex approaches widen this 3-valued polarity to “5-stars”approach which finds its usage in different rating systems defining graded polarities. As soon as the set of individual opinions is extracted in the form of polarities or graded polarities, this information is aggregated into a collective sentiment. However, more sophisticated theories are required if there is a need to develop collective judgments comparing two or more objects.
Machine learning is the ability of a machine to enhance its performance using artificial intelligence techniques. It involves construction and studying of algorithm to design a model from data and inputs received. This helps in taking predictions and decisions based on model developed from the inputs. Machine learning enables in analyzing sentiments based on the programs developed that possesses the ability to transform according to the data.
The concept of Voting Theory is used to aggregate the preferences of the individuals. There are a number of voting rules that are used to define the sentiment of collectivity. Methods from the field of voting rules, knowledge representation, and preference aggregation should therefore be adapted in order to model the opinion of each individual and aggregate individual opinion into a collective one. According to the paper titled – From Sentiment Analysis to Preference Aggregation, one of the most crucial aspects is aggregation time of the information about individual sentiments and the information regarding individual preference comparisons. Related research work on this topic can be found in the work on social choice theory. Voting theory is a more appropriate approach than polarity if individuals express more than two feelings as like or dislike over several items.
Recently, sentiment analysis is gaining widespread attention in industry and the media. Several web tools, commercial products and applications are being developed in this field. Some of this analysis is based on specific events like death of a popular public figure while other analysis deals with different socio-economic trends and its relationship with tweets like political opinion, stock market fluctuations and others as well.
The results obtained through the analysis of collective mood aggregators are convincing and shows that accurate sentiment analysis can be retrieved from online posts. Performing sentiment analysis online reduces the cost, effort and time needed to conduct public surveys and questionnaires. This data is of utmost importance to social scientists and psychologists.
A research paper has been published on, “Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena” by Johan Bollen and Huina Mao (School of Informatics and Computing)of Indiana University and Alberto Pepe (Center for Astrophysics) of Harvard University. This paper performed sentiment analysis of tweets in the latter half of 2008. In this research, sentiments of each tweet were measured with the help of extended version of the Profile of Mood States (POMS). The results were compared to important events that took place in that time period. Researchers found out those social, political, cultural, and economic events of that time period were correlated with sentiments of the tweets even if delayed fluctuations were found in the public mood. It was concluded that sentiment analysis of tweets is retrieved through a syntactic or term-based approach that needs no special expertise. Sentiment analysis techniques embedded in machine learning offers accurate results when sufficiently large data is available for testing. However, small texts for collective sentiment analysis including microblogs is quite challenging for this approach.
Image Credits : midsizeinsider