ACM Home Page
Please provide us with feedback. Feedback
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
Full text PdfPdf (328 KB)
Source International World Wide Web Conference archive
Proceedings of the 12th international conference on World Wide Web table of contents
Budapest, Hungary
SESSION: Data mining table of contents
Pages: 519 - 528  
Year of Publication: 2003
ISBN:1-58113-680-3
Authors
Kushal Dave  NEC Laboratories America, Princeton, NJ
Steve Lawrence  NEC Laboratories America, Princeton, NJ
David M. Pennock  Overture Services, Inc., Pasadena, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 46,   Downloads (12 Months): 448,   Citation Count: 97
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775152.775226
What is a DOI?

ABSTRACT

The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete web-based tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Sanjiv Ranjan Das and Mike Y. Chen. Yahoo! for Amazon: Sentiment parsing from small talk on the web. Proceedings of the 8th Asia Pacific Finance Association Annual Conference, 2001.
 
2
 
3
W. Gale. Good-Turing smoothing without tears. Journal of Quantitative Linguistics, 2:217--37, 1995.
 
4
 
5
 
6
M. Hearst. Direction-Based Text Interpretation as an Information Access Refinement. 1992.
 
7
David Holtzmann. Detecting and tracking opinions in on-line discussions. UCB/SIMS Web Mining Workshop, 2001.
 
8
 
9
Andrew Kachites McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow, 1996.
10
 
11
 
12
R. Mooney, P. Bennett, and L. Roy. Book recommending using text categorization with extracted information. Proceedings of the AAAI Workshop on Recommender Systems, 1998.
13
 
14
 
15
Fernando Pereira, Yoram Singer, and Naftali Tishby. Beyond word N-grams. In David Yarovsky and Kenneth Church, editors, Proceedings of the Third Workshop on Very Large Corpora, pages 95--106, Somerset, New Jersey, 1995. Association for Computational Linguistics.
 
16
 
17
M.F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980. http://www.tartarus.org/~martin/PorterStemmer/.
 
18
 
19
Ellen Riloff. Automatically generating extraction patterns from untagged text. Proceedings of AAAI/IAAI, Vol. 2, pages 1044--1049, 1996.
 
20
P. Subasic and A. Huettner. Affect analysis of text using fuzzy semantic typing. IEEE-FS, 9:483--496, Aug. 2001.
21
 
22
Richard M. Tong. An operational system for detecting and tracking opinions in on-line discussion. SIGIR Workshop on Operational Text Classifiation, 2001.
 
23
P.D. Turney and M.L. Littman. Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report ERB-1094, National Research Council Canada, Institute for Information Technology, 2002.
 
24
 
25
 
26
Janyce Wiebe, Theresa Wilson, and Matthew Bell. Identifying collocations for recognizing opinions. Proceedings of ACL/EACL 2001 Workshop on Collocation.
 
27
Mikio Yamamoto and Kenneth Church. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Proceedings of the 6th Workshop on Very Large Corpora.

CITED BY  97

Collaborative Colleagues:
Kushal Dave: colleagues
Steve Lawrence: colleagues
David M. Pennock: colleagues