Search for content and authors |
Application of semantic spaces to sentiment analysis for words |
Marcin Tatjewski , Wojciech P. Jaworski |
University of Warsaw, Institute of Informatics, Banacha 2, Warsaw 02-097, Poland |
Abstract |
We propose a novel approach to mining opinion data from large scale text corpora using semantic modeling. The objective of the presented method is to determine numeric, real-valued sentiment scores for words within a text corpus. Our technique consists of two main stages: 1st is generation of a semantic space over a given corpus; 2nd is construction of a regression model over the generated semantic space using a non-binary (higher cardinality) sentiment lexicon as a source of training data. As semantic spaces we name high-dimensional matrix methods of representing semantic relations in text, such as Hyperspace Analogue to Language or Latent Semantic Analysis. For our need we applied the Correlated Occurrence Analogue to Lexical Semantics method, which proven to be especially successful in synonymy detection task. The main learning algorithm we used for constructing regression models was Support Vector Machine, which we tested with several different kernels. Linear Regression and kNN were used as benchmarks. For the purpose of our method evaluation, we tested the proposed approach on the American National Corpus and the sentiment lexicon we applied was the SentiStrength’s lexicon. |
Legal notice |
|
Presentation: Oral at CyberEmotions conference, by Wojciech P. JaworskiSee On-line Journal of CyberEmotions conference Submitted: 2012-11-04 23:11 Revised: 2012-12-12 16:03 |