ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection
Full text PdfPdf (417 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 18th ACM conference on Information and knowledge management table of contents
Hong Kong, China
POSTER SESSION: Poster session 6: IR track table of contents
Pages: 1863-1866  
Year of Publication: 2009
ISBN:978-1-60558-512-3
Authors
Shariq Bashir  Vienna University of Technology, Vienna, Austria
Andreas Rauber  Vienna University of Technology, Vienna, Austria
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 41,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1645953.1646250
What is a DOI?

ABSTRACT

High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. Findability is hindered by two aspects, namely the inherent bias favoring some types of documents over others introduced by the retrieval model, and the failure to correctly capture and interpret the context of conventionally rather short queries. In this paper, we analyze the bias impact of different retrieval models and query expansion strategies. We furthermore propose a novel query expansion strategy based on document clustering to identify dominant relevant documents. This helps to overcome limitations of conventional query expansion strategies that suffer strongly from the noise introduced by imperfect initial query results for pseudo-relevance feedback documents selection. Experiments with different collections of patent documents suggest that clustering based document selection for pseudo-relevance feedback is an effective approach for increasing the findability of individual documents and decreasing the bias of a retrieval system.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
B. C. M. Fung, K. Wang, M. Ester. Hierarchical document clustering using frequent itemsets. In Proc. of SDM' 03, 2003, USA.
 
6
7
 
8
K. Konishi. Query terms extraction from patent document for invalidity search. In Proc. of NTCIR '05, 2005, Japan.
9
10
 
11
S. E. Robertson, S. Walker. Okapi/Keenbow at TREC-8. In Proc. TREC-8, 1999, USA.
12
13

Collaborative Colleagues:
Shariq Bashir: colleagues
Andreas Rauber: colleagues