ACM Home Page
Please provide us with feedback. Feedback
Document retrieval for question answering: a quantitative evaluation of text preprocessing
Full text PdfPdf (397 KB)
Source
Conference on Information and Knowledge Management archive
Proceedings of the ACM first Ph.D. workshop in CIKM table of contents
Lisbon, Portugal
SESSION: Session 3 table of contents
Pages 125-130  
Year of Publication: 2007
ISBN:978-1-59593-832-9
Authors
Gracinda Carvalho  Universidade Aberta, Lisboa, Portugal
David Martins de Matos  IST - Instituto Superior Técnico, Lisboa, Portugal
Vitor Rocio  Universidade Aberta, Lisboa, Portugal
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 50,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1316874.1316894
What is a DOI?

ABSTRACT

Question Answering (QA) has been an area of interest for researchers, in part motivated by the international QA evaluation forums, namely the Text REtrieval Conference (TREC), and more recently, the Cross Language Evaluation Forum (CLEF) through QA@CLEF, that since 2004 includes the Portuguese language. In these forums, a collection of written documents is provided, as well as a set of questions, which are to be answered by the participating systems. Each system is evaluated by its capacity to answer the questions, as a whole, and there are relatively few results published that focus on the performance of its different components and their influence on the overall system performance. That is the case of the Information Retrieval (IR) component, which is broadly used in QA systems.

Our work concentrates on the different options of preprocessing Portuguese text before feeding it to the IR component, evaluating their impact on the IR performance in the specific context of QA, so that we can make a sustained choice of which options to choose. From this work we conclude the clear advantage of the basic preprocessing techniques: case folding and removal of punctuation marks. For the other techniques considered, stop word removal enhanced the performance of the IR system but that was not the case as far as Stemming and Lemmatization are concerned.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Alves, M. A. Engenharia do Léxico Computacional: princípios, tecnologia e o caso das palavras compostas. Mestrado emEngenharia Informática. Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, (20 Feb. 2002). www.liacc.up.pt/~maa/elc.html
 
2
Bilotti, M.W., Katz, B. and Lin, J. What Works Better for Question Answering: Stemming or Morphological Query Expansion? ACM SIGIR'04 Workshop Information Retrieval for QA, (Jul. 2004).
3
 
4
Roberts, I., and Gaizauskas, R. Evaluating Passage Retrieval Approaches for Question Answering. Lecture Notes in Computer Science, Book: Advances in Information Retrieval, Volume 2997, (Mar. 2004), 72--84.
 
5
Santos, D. and Rocha,P. CHAVE: topics and questions on the Portuguese participation in CLEF. In C. Peters and F. Borri, editors, Cross Language Evaluation Forum: Working Notes for the CLEF 2004 Workshop, Bath, UK, (15--17 September 2004) Pg. 639--648
6
7


Collaborative Colleagues:
Gracinda Carvalho: colleagues
David Martins de Matos: colleagues
Vitor Rocio: colleagues