ACM Home Page
Please provide us with feedback. Feedback
Opinion mining from noisy text data
Full text PdfPdf (629 KB)
Source AND; Vol. 303 archive
Proceedings of the second workshop on Analytics for noisy unstructured text data table of contents
Singapore
Pages 83-90  
Year of Publication: 2008
ISBN:978-1-60558-196-5
Authors
Lipika Dey  TCS Innovation Lab Delhi, Udyog Vihar, Gurgaon, India
S K Mirajul Haque  TCS Innovation Lab Delhi, Udyog Vihar, Gurgaon, India
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 285,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390749.1390763
What is a DOI?

ABSTRACT

The proliferation of Internet has not only generated huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks etc. The data generated from online communication acts as potential gold mines for discovering knowledge. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. Most of these techniques use Natural Language Processing techniques which assume that the underlying text is clean and correct. Statistical techniques, though not as accurate as linguistic mechanisms, are also employed for the purpose to overcome the dependence on clean text. The chief bottleneck for designing statistical mechanisms is however its dependence on appropriately annotated training data. None of these methodologies are suitable for mining information from online communication text data due to the fact that they are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. Ours is a hybrid approach, in which we initially employ a semi-supervised method to learn domain knowledge from a training repository which contains both noisy and clean text. Thereafter we employ localized linguistic techniques to extract opinion expressions from noisy text. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions extracted from a repository.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Pavel Smrž, "Using WordNet for Opinion Mining", GWC 2006 Proceedings, pp. 333--335, 2006
 
2
Tetsuya Nasukawa, Diwakar Punjani, Shourya Roy, L. Venkata Subramaniam, Hironori Takeuchi, "Adding Sentence Boundaries to Conversational Speech Transcriptions using Noisily Labelled Examples", Proc. IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text, pp. 71--78, India, 2007
 
3
 
4
Wilson Wong, Wei Liu and Mohammed Bennamoun, "Enhanced Integrated Scoring for Cleaning Dirty Texts", Proc. IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text, pp. 55--62, India, 2007
5
 
6
Minqing Hu and Bing Liu, "Mining Opinion Features in Customer Reviews", American Association for Artificial Intelligence, 2004
 
7
Andrea Esuli and Fabrizio Sebastiani, "SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining", LREC2006 Conference on Language Resources and Evaluation, Genova, 2006
 
8


Collaborative Colleagues:
Lipika Dey: colleagues
S K Mirajul Haque: colleagues