Search for content and authors
 

Application of semantic spaces to sentiment analysis for words

Marcin Tatjewski ,  Wojciech P. Jaworski 

University of Warsaw, Institute of Informatics, Banacha 2, Warsaw 02-097, Poland

Abstract

We propose a novel approach to mining opinion data from large scale text corpora using semantic modeling. The objective of the presented method is to determine numeric, real-valued sentiment scores for words within a text corpus. Our technique consists of two main stages: 1st is generation of a semantic space over a given corpus; 2nd is construction of a regression model over the generated semantic space using a non-binary (higher cardinality) sentiment lexicon as a source of training data. As semantic spaces we name high-dimensional matrix methods of representing semantic relations in text, such as Hyperspace Analogue to Language or Latent Semantic Analysis. For our need we applied the Correlated Occurrence Analogue to Lexical Semantics method, which proven to be especially successful in synonymy detection task. The main learning algorithm we used for constructing regression models was Support Vector Machine, which we tested with several different kernels. Linear Regression and kNN were used as benchmarks. For the purpose of our method evaluation, we tested the proposed approach on the American National Corpus and the sentiment lexicon we applied was the SentiStrength’s lexicon.

Results of our computational experiments were analyzed using both standard measures and novel measures that we designed specifically for evaluating methods of this type. Obtained results prove that it is possible to extract sentiment from semantics using the proposed technique. Future perspectives for the introduced method are promising, as its complexity and dependence on resources gives several opportunities for enhancements. Particularly, consideration of other non-binary sentiment lexicons like MPQA Subjectivity Lexicon or SentiWordNet and also lexicon preprocessing steps might significantly improve the future outcome.

Our research was inspired by tools and tasks that were introduced to us by psychologists. They wanted to analyze emotional value of words in text even when sentiment was not expressed explicitly. Our method provides possibility to obtain comparable sentiment values for even common words like: man, woman, church, government, abortion, taxes etc. While similar problems were already considered in the past, semantic space modeling was rarely used for solving them, what also constrained the use of advanced machine learning tools.

 

Legal notice
  • Legal notice:
 

Presentation: Oral at CyberEmotions conference, by Wojciech P. Jaworski
See On-line Journal of CyberEmotions conference

Submitted: 2012-11-04 23:11
Revised:   2012-12-12 16:03