Baseline Run: At rst, we used the training dataset of 1.6 million tweets of sentiment140 dataset to train the Naive-Bayes classier for classifying the sentiment of the test set, which we considered as our baseline. In this run, we just use the bag-of-words (BoW) feature and did not perform any text preprocessing task. Run1: At Run1, we consider the similar kind of setup like Baseline Run. But here we incorporate our text preprocessing strategy to improve the classication result. Run2: At Run2, we trained the Naive-Bayes classier with several sentiment lexicons instead of using large training dataset of 1:6 million tweets. We also incorporate our text preprocessing …show more content…
The strategy is that at rst our rule-based classier is applied to classify the tweets sentiment as positive, negative or unknown. As, our goal is to classify the tweets sentiment only positive or negative class. So, for the tweets that are labeled as unknown by the rule-based classi er, we consider the predictions of Naive-Bayes classier as the nal labels.
Run4: At this run, we combined our proposed rule-based classier with the setup of Run2, to improve the classication performance. Here, we use the similar kind of combination strategy already described in the experimental setup of Run3.
Run5: At this run, we used the training dataset of 1:6 million tweets of sentiment140 dataset to train the Multiclass SVM classier from cornell university [67] for classifying the sentiment of the test set. In this run, we use our preprocessing strategy and the bag-of-words (BoW) feature. For feature weighting, we use the
TF-IDF weighting …show more content…
Run8: At Run8, we trained the weka's [69] multinomial Naive-Bayes classi- er with our selected 35 features. Then, we combined our proposed rule-based classi er to improve the classication performance. Here, we also use the similar kind of combination strategy already described in the experimental setup of
Run3.
Run9: At Run9, at rst our rule-based classier is applied to classify the tweet sentiment as positive, negative or unknown. For the tweets that are labeled as unknown by the rule-based classier, we consider the majority voting count based predictions from several classiers as stated below:
Probabilistic Naive Bayes Classier: Trained with the sentiment140 dataset.
(BoW feature)
Probabilistic Naive Bayes Classier: Trained with our combined sentiment lexicons. (BoW feature)
Probabilistic Multiclass SVM Classier: Trained with the sentiment140 dataset. (BoW feature with TF-IDF weighting scheme)
Probabilistic SMO Classier: Trained with the sentiment140 dataset. (Selected
35 features)
Probabilistic Naive Bayes Multinomial Classier: Trained with the sentiment140 dataset. (Selected 35