|
ABSTRACT
Discussion boards and online forums are important platforms for people to share information. Users post questions or problems onto discussion boards and rely on others to provide possible solutions and such question-related content sometimes even dominates the whole discussion board. However, to retrieve this kind of information automatically and effectively is still a non-trivial task. In addition, the existence of other types of information (e.g., announcements, plans, elaborations, etc.) makes it difficult to assume that every thread in a discussion board is about a question. We consider the problems of identifying question-related threads and their potential answers as classification tasks. Experimental results across multiple datasets demonstrate that our method can significantly improve the performance in both question detection and answer finding subtasks. We also do a careful comparison of how different types of features contribute to the final result and show that non-content features play a key role in improving overall performance. Finally, we show that a ranking scheme based on our classification approach can yield much better performance than prior published methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Eugene Agichtein , Carlos Castillo , Debora Donato , Aristides Gionis , Gilad Mishne, Finding high-quality content in social media, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341557]
|
| |
2
|
F. Antonelli and M. Sapino. A rule based approach to message board topics classification. In Advances in Multimedia Information Systems, pages 33--48, 2005.
|
 |
3
|
Adam Berger , Rich Caruana , David Cohn , Dayne Freitag , Vibhu Mittal, Bridging the lexical chasm: statistical approaches to answer-finding, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.192-199, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345576]
|
 |
4
|
|
 |
5
|
Yunbo Cao , Huizhong Duan , Chin-Yew Lin , Yong Yu , Hsiao-Wuen Hon, Recommending questions using the mdl-based tree cut model, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
[doi> 10.1145/1367497.1367509]
|
| |
6
|
|
| |
7
|
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available from http://www.csie.ntu.edu.tw/Ücjlin/libsvm.
|
 |
8
|
Gao Cong , Long Wang , Chin-Yew Lin , Young-In Song , Yueheng Sun, Finding question-answer pairs from online forums, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390415]
|
| |
9
|
S. Ding, G. Cong, C. Lin, and X. Zhu. Using conditional random fields to extract contexts and answers of questions from online forums. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Tchnologies (ACL:HLT), pages 710--718, Columbus, OH, June 2008.
|
| |
10
|
H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Tchnologies (ACL:HLT), Columbus, OH, June 2008.
|
 |
11
|
|
| |
12
|
Donghui Feng , Erin Shaw , Jihie Kim , Eduard Hovy, Learning to detect conversation focus of threaded discussions, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.208-215, June 04-09, 2006, New York, New York
[doi> 10.3115/1220835.1220862]
|
| |
13
|
Z. Gyöngyi, G. Koutrika, J. Pedersen, and H. Garcia-Molina. Questioning Yahoo! Answers. In Proceedings of the First Workshop on Question Answering on the Web, 2008.
|
 |
14
|
Meiqun Hu , Ee-Peng Lim , Aixin Sun , Hady Wirawan Lauw , Ba-Quy Vuong, On improving wikipedia search using article quality, Proceedings of the 9th annual ACM international workshop on Web information and data management, November 09-09, 2007, Lisbon, Portugal
[doi> 10.1145/1316902.1316926]
|
| |
15
|
J. Huang, M. Zhou, and D. Yang. Extracting chatbot knowledge from online discussion forums. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 423--428, Jan. 2007.
|
 |
16
|
|
 |
17
|
|
 |
18
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148212]
|
 |
19
|
|
 |
20
|
|
| |
21
|
J. Kim, G. Chern, D. Feng, E. Shaw, and E. Hovy. Mining and assessing discussions on the web through speech act analysis. In Proceedings of the Workshop on Web Content Mining with Human Language Technologies at the 5th International Semantic Web Conference, 2006.
|
| |
22
|
J. Kim, E. Shaw, D. Feng, C. Beal, and E. Hovy. Modeling and assessing student activities in on-line discussions. In Proceedings of the Workshop on Educational Data Mining at AAAI, 2006.
|
| |
23
|
C.-J. Lin and C.-H. Cho. Question pre-processing in a QA system on internet discussion groups. In Proceedings of the Workshop on Task--Focused Summarization and Question Answering, 2006.
|
| |
24
|
Yuanjie Liu , Shasha Li , Yunbo Cao , Chin-Yew Lin , Dingyi Han , Yong Yu, Understanding and summarizing answers in community-based question answering services, Proceedings of the 22nd International Conference on Computational Linguistics, p.497-504, August 18-22, 2008, Manchester, United Kingdom
|
| |
25
|
Jian Pei , Jiawei Han , Behzad Mortazavi-Asl , Helen Pinto , Qiming Chen , Umeshwar Dayal , Meichun Hsu, PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth, Proceedings of the 17th International Conference on Data Engineering, p.215-224, April 02-06, 2001
|
| |
26
|
S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 2007.
|
| |
27
|
|
| |
28
|
Y.-I. Song, C.-Y. Lin, Y. Cao, and H.-C. Rim. Question utility: A novel static ranking of question search. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence, July 2008.
|
| |
29
|
M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large online qa collections. In 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2008.
|
| |
30
|
|
| |
31
|
|
 |
32
|
|
 |
33
|
|
| |
34
|
|
|