Data can be loaded from a local file (csv format) or from a Table/View from Database. Data Import from a Table is done through ODBC. The data is expected in a Tabular format, with clear headers for ID, Text, Date, Location etc. ID identifying each Comment is mandatory, although Date and Location are optional. They are used primarily for Trend Analysis.
This utility provides the most frequent mentions in the form of a Wordcloud. User has the option to generate Wordclouds in two ways - Overall or Context Based. Overall Option generates Wordcloud for all the words combined together. Context Based generates three Separate Wordclouds based on Part of Speech grouping - Nouns, Verbs and Adjectives. Context Based word distribution can be more insightful. However Context Based option would take more time due to Parts of Speech tagging of the words.
Overall Option generates two Worclouds. The Wordcloud on the left is a Global Wordcloud, which is based on the entire Text Corpus. Minimum Frequency refers to cutoff value for frequency of words and removes very less frequent words. Maximum Words controls the number of words to be displayed. The default values for them are 5 and 50 respectively. A lower Maximum words and higher Minimum Frequency is recommended as the data size grows. User can query all the comments containing a particular word by giving it as an input in the Keyword Input Box. One can provide combination of words too, by using AND & OR. A wordcloud on the Query results is displayed on the right side.
User can remove an obvious or redundant Word from the Wordcloud by typing it in STOPWORD Input Box. Once again, one can provide multiple words by separating them with Comma(no space after comma).
Sentiment Analysis provides Sentiment Score for each Dimensions extracted from the overall Text Corpus as well as for each Comment, which is idetified by ID. It provides Positive and Negative Sentiment Scores separately. The Histogram (Bar Charts) shows the distribution of Sentiment Score. The Sentiment Score near the peak gives an overall indication of the sentiment. Hence if the Peak and Bigger Bars are on the Positive Side on X-axis, the overall Sentiment should be construed as Positive and vice versa. In addition to that no. of occurences of a particular Dimension is represented in the form of Count. These measures can be sorted in the Table or a Dimension can be searched. A Criticality Score, combining all these measures is also provided. Sentiment Analysis may take several minutes if number of comments is relatively higher.
This Analysis presents the Sentiments for Geography and Time, if Text Data has clearly defined fields for them. The Geo Analysis is done at Country level, where the Country names are expected in ISO-3166 Code or its English equivalent. Average Positive and Negative Sentiment Scores are displayed in a Geomap on two Charts. Time Analysis provides a Motion chart, depicting the movement of Sentiment over the period of time. Date format in the input file can be a problem in certain cases, and it is safe to use dd-mm-yyyy formats
Sometimes presence of non UTF-8 characters may pose problems. One can save the file as Unicode Text (*.txt), open it in Notepad and replace all Tabs with Comma. Tab character can be picked by dragging the character between two Column headers. Then the file can be saved as *.csv with encoding to UTF-8