ACM Home Page
Please provide us with feedback. Feedback
A time machine for text search
Full text PdfPdf (268 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Index structures table of contents
Pages: 519 - 526  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Klaus Berberich  Max-Planck Institute for Informatics, Saarbruecken, Germany
Srikanta Bedathur  Max-Planck Institute for Informatics, Saarbruecken, Germany
Thomas Neumann  Max-Planck Institute for Informatics, Saarbruecken, Germany
Gerhard Weikum  Max-Planck Institute for Informatics, Saarbruecken, Germany
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 159,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277831
What is a DOI?

ABSTRACT

Text search over temporally versioned document collections such as web archives has received little attention as a research problem. As a consequence, there is no scalable and principled solution to search such a collection as of a specified time. In this work, we address this shortcoming and propose an efficient solution for time-travel text search by extending the inverted file index to make it ready for temporal search. We introduce approximate temporal coalescing as a tunable method to reduce the index size without significantly affecting the quality of results. In order to further improve the performance of time-travel queries, we introduce two principled techniques to trade off index size for its performance. These techniques can be formulated as optimization problems that can be solved to near-optimality. Finally, our approach is evaluated in a comprehensive series of experiments on two large-scale real-world datasets. Results unequivocally show that our methods make it possible to build an efficient "time machine" scalable to large versioned text collections.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
K. Berberich, S. Bedathur, T. Neumann, and G. Weikum. A Time Machine for Text search. Technical Report MPI-I-2007-5-002, Max-Planck Institute for Informatics, 2007.
 
6
 
7
P. Boldi, M. Santini, and S. Vigna. Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations. In WAW, 2004.
 
8
A. Z. Broder, N. Eiron, M. Fontoura, M. Herscovici, R. Lempel, J. McPherson, R. Qi, and E. J. Shekita. Indexing Shared Content in Information Retrieval Systems. In EDBT, 2006.
9
 
10
M. Burrows and A. L. Hisgen. Method and Apparatus for Generating and Searching Range-Based Index of Word Locations. U.S. Patent 5,915,251, 1999.
 
11
S. Büttcher and C. L. A. Clarke. A Document-Centric Approach to Static Index Pruning in Text Retrieval Systems. In CIKM, 2006.
12
 
13
 
14
 
15
 
16
 
17
M. Hersovici, R. Lempel, and S. Yogev. Efficient Indexing of Versioned Document Sequences. In ECIR, 2007.
 
18
19
 
20
 
21
 
22
S. Kirkpatrick, D. G. Jr., and M. P. Vecchi. Optimization by Simulated Annealing. Science, 220(4598):671--680, 1983.
 
23
 
24
 
25
K. Nørvåg and A. O. N. Nybø. DyST: Dynamic and Scalable Temporal Text Indexing. In TIME, 2006.
26
 
27
S. E. Robertson and S. Walker. Okapi/Keenbow at TREC-8. In TREC, 1999.
28
 
29
M. Stack. Full Text Search of Web Archive Collections. In IWAW, 2006.
 
30
E. Terzi and P. Tsaparas. Efficient Algorithms for Sequence Segmentation. In SIAM-DM, 2006.
 
31
 
32
 
33
34
35

CITED BY  7

Collaborative Colleagues:
Klaus Berberich: colleagues
Srikanta Bedathur: colleagues
Thomas Neumann: colleagues
Gerhard Weikum: colleagues