ACM Home Page
Please provide us with feedback. Feedback
Extraction of coherent relevant passages using hidden Markov models
Full text PdfPdf (330 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 24 ,  Issue 3  (July 2006) table of contents
Pages: 295 - 319  
Year of Publication: 2006
ISSN:1046-8188
Authors
Jing Jiang  University of Illinois, Urbana, IL
Chengxiang Zhai  University of Illinois, Urbana, IL
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 135,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1165774.1165775
What is a DOI?

ABSTRACT

In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage is presumably highly sensitive to both the query and document.In this article, we present a new method for accurately detecting coherent relevant passages of variable lengths using hidden Markov models (HMMs). The HMM-based method naturally captures the topical boundaries between passages relevant and nonrelevant to the query. Pseudo-feedback mechanisms can be naturally incorporated into such an HMM-based framework to improve parameter estimation. We show that with appropriate parameter estimation, the HMM method outperforms a number of strong baseline methods on two datasets. We further show how the HMM method can be applied on top of any basic passage extraction method to improve passage boundaries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Allan, J. 2003. Hard track overview in trec 2003: High accuracy retrieval from documents. In Proceedings of the 12th Text REtrieval Conference. 24--37.
 
2
3
 
4
Conroy, J. and O'Leary, D. P. 2001. Text summarization via hidden Markov models and pivoted QR matrix decomposition. Tech. Rep., University of Maryland, College Park.
 
5
Cormack, G. V., Clarke, C. L. A., Palmer, C. R., and To, S. S. L. 1998. Passage-based refinement (MultiText experiments for TREC-6). In Proceedings of the 6th Text REtrieval Conference. 303--320.
6
 
7
Denoyer, L. and Zaragoza, H. 2001. HMM-based passage models for document classification and ranking. In Proceedings of the 23rd BCS European Annual Colloquium on Information Retrieval.
 
8
 
9
 
10
He, D., Demner-Fushman, D., Oard, D. W., Karakos, D., and Khudanpur, S. 2004. Improving passage retrieval using interactive elicitation and statistical modeling. In Proceedings of the 13th Text REtrieval Conference.
 
11
 
12
Jiang, J. and Zhai, C. 2004. UIUC in HARD 2004--Passage retrieval using HMMs. In Proceedings of the 13th Text REtrieval Conference.
13
 
14
 
15
Knaus, D., Mittendorf, E., Schäuble, P., and Sheridan, P. 1996. Highlighting relevant passages for users of the interactive SPIDER retrieval system. In Proceedings of the 4th Text REtrieval Conference.
16
17
 
18
 
19
Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.
20
21
 
22
Zajic, D., Dorr, B., and Schwartz, R. 2005. Headline generation for written and broadcast news. Tech. Rep. LAMP-TR-120, CS-TR-4698, UMIACS-TR-2005-07, University of Maryland, College Park.
23
24

Collaborative Colleagues:
Jing Jiang: colleagues
Chengxiang Zhai: colleagues