ACM Home Page
Please provide us with feedback. Feedback
On single-pass indexing with MapReduce
Full text PdfPdf (501 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
POSTER SESSION: Posters table of contents
Pages 742-743  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Richard M. C. McCreadie  University of Glasgow, Glasgow, Scotland Uk
Craig Macdonald  University of Glasgow, Glasgow, Scotland Uk
Iadh Ounis  University of Glasgow, Glasgow, Scotland Uk
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 62,   Downloads (12 Months): 162,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1572106
What is a DOI?

ABSTRACT

Indexing is an important Information Retrieval (IR) operation, which must be parallelised to support large-scale document corpora. We propose a novel adaptation of the state-of-the-art single-pass indexing algorithm in terms of the MapReduce programming model. We then experiment with this adaptation, in the context of the Hadoop MapReduce implementation. In particular, we explore the scale of improvements that can be achieved when using firstly more processing hardware and secondly larger corpora. Our results show that indexing speed increases in a close to linear fashion when scaling corpus size or number of processing machines. This suggests that the proposed indexing implementation is viable to support upcoming large-scale corpora.



Collaborative Colleagues:
Richard M. C. McCreadie: colleagues
Craig Macdonald: colleagues
Iadh Ounis: colleagues