|
ABSTRACT
The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies, that are increasingly concerned about monitoring the discussion around their products. Tracking such discussion on weblogs, provides useful insight on how to improve products or market them more effectively. An important component of such analysis is to characterize the sentiment expressed in blogs about specific brands and products. Sentiment Analysis focuses on this task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this paper, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. Empirical results on diverse domains show that our approach performs better than using background knowledge or training data in isolation, as well as alternative approaches to using lexical knowledge with text classification.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Blogpulse: A service of nielsen buzzmetrics. http://www.blogpulse.com/.
|
| |
3
|
R. T. Clemen and R. L. Winkler. Combining probability distributions from experts in risk analysis. Risk Analysis, 19:187--203, 1999.
|
| |
4
|
W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive Bayes classifiers for text classification. In AAAI, 2007.
|
| |
5
|
S. Das and M. Chen. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Asia Pacific Finance Association, 2001.
|
 |
6
|
Aynur Dayanik , David D. Lewis , David Madigan , Vladimir Menkov , Alexander Genkin, Constructing informative prior distributions from domain knowledge in text classification, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148255]
|
 |
7
|
|
| |
8
|
K. T. Durant and M. D. Smith. Advances in Web Mining and Web Usage Analysis, chapter Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection. Springer, 2007.
|
| |
9
|
Extracting the main content from a webpage. http://w-shadow.com/blog/2008/01/25/extracting-the-main-content-from-a-webpage/.
|
| |
10
|
S. French. Group consensus probability distributions: A critical survey. In Bayesian Statistics 2, pages 183--197. North-Holland, 1985.
|
| |
11
|
C. Genest and J. V. Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1:114--135, 1986.
|
 |
12
|
|
| |
13
|
|
| |
14
|
B. Liu. Web Data Mining. Springer, 2007.
|
| |
15
|
Bing Liu , Xiaoli Li , Wee Sun Lee , Philip S. Yu, Text classification by labeling words, Proceedings of the 19th national conference on Artifical intelligence, p.425-430, July 25-29, 2004, San Jose, California
|
| |
16
|
A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI Workshop on Text Categorization, 1998.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Ganesh Ramakrishnan , Apurva Jadhav , Ashutosh Joshi , Soumen Chakrabarti , Pushpak Bhattacharyya, Question Answering via Bayesian inference on lexical relations, Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering, p.1-10, July 11, 2003, Sapporo, Japan
[doi> 10.3115/1119312.1119313]
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
Scott Spangler , Ying Chen , Larry Proctor , Ana Lelescu , Amit Behal , Bin He , Thomas D. Griffin , Anna Liu , Brad Wade , Trevor Davis, COBRA - Mining Web for Corporate Brand and Reputation Analysis, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, p.11-17, November 02-05, 2007
[doi> 10.1109/WI.2007.34]
|
| |
28
|
|
| |
29
|
Theresa Wilson , Janyce Wiebe , Paul Hoffmann, Recognizing contextual polarity in phrase-level sentiment analysis, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.347-354, October 06-08, 2005, Vancouver, British Columbia, Canada
[doi> 10.3115/1220575.1220619]
|
| |
30
|
R. L. Winkler. The consensus of subjective probability distributions. Management Science, 15:361--375, 1968.
|
 |
31
|
|
| |
32
|
|
 |
33
|
|
CITED BY 3
|
|
|
|
|
Vikas Sindhwani , Prem Melville , Richard D. Lawrence, Uncertainty sampling and transductive experimental design for active dual supervision, Proceedings of the 26th Annual International Conference on Machine Learning, p.953-960, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
INDEX TERMS
Primary Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.6
Learning
Additional Classification:
I.
Computing Methodologies
I.5
PATTERN RECOGNITION
I.5.1
Models
General Terms:
Algorithms,
Economics,
Experimentation
Keywords:
background knowledge,
blog analysis,
dual supervision,
movie reviews,
naive bayes,
opinion mining,
political blogs,
prior knowledge,
sentiment analysis,
technology blogs,
text mining
|