|
ABSTRACT
The state-of-the-art techniques for processing multi-term queries in P2P environments are query flooding and inverted list intersection. However, it has been shown that due to scalability reasons both methods fail to support full-text search in large scale document collections distributed among the nodes in a P2P network. Although a number of optimizations have been suggested recently based on the aforementioned techniques, little evidence is given on their scalability. In this paper we suggest a novel query-driven indexing strategy which generates and maintains only those index entries that are actually used for query processing. In our approach called Distributed Cache Table<u>1 By analogy with Distributed Hash Table (DHT) (DCT) we suggest to abandon the difference between data indexing and query caching, and to store result sets (caches) for the most profitable queries. DCT employs a distributed index to efficiently locate caches that can answer a given multi-term query and broadcasts the query to all the peers only if no such caches were found. Evaluations on real data and query loads show that DCT converges to a high cache-hit ratio and indeed offers a large-scale distributed solution for storing and efficient querying of vast amounts of documents in the P2P setting. DCT achieves two orders of magnitude improvement in traffic consumption compared to a standard distributed single-term indexing approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Ashwin R. Bharambe , Mukesh Agrawal , Srinivasan Seshan, Mercury: supporting scalable multi-attribute range queries, Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, August 30-September 03, 2004, Portland, Oregon, USA
|
| |
2
|
B. Bhattacharjee, S. Chawathe, V. Gopalakrishnan, P. Keleher, and B. Silaghi. Efficient peer-to-peer searches using result-caching. In IPTPS'03, Berkeley, CA, USA, 2003.
|
| |
3
|
M. Cai, M. Frank, J. Chen, and P. Szekely. Maan: A multi-attribute addressable network for grid information services. Journal of Grid Computing, 2(1), 2004.
|
| |
4
|
I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong. Freenet: A distributed anonymous information storage and retrieval system. Lecture Notes in Computer Science, 2009, 2001.
|
| |
5
|
P. Cudré-Mauroux and K. Aberer. A decentralized architecture for adaptive media dissemination. In ICME'02, Lausanne, Switzerland, 2002.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
B. T. Loo, R. Huebsch, J. M. Hellerstein, S. Shenker, and I. Stoica. Enhancing p2p file-sharing with an internet-scale query processor. In VLDB'04, Toronto, Canada, 2004.
|
| |
11
|
J. Lu and J. Callan. Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In ECIR'05, Santiago de Compostela, Spain, 2005.
|
| |
12
|
|
| |
13
|
I. Podnar, T. Luu, M. Rajman, F. Klemm, and K. Aberer. A peer-to-peer architecture for information retrieval across digital library collections. In ECDL'06, Alicante, Spain (to appear), 2006.
|
| |
14
|
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
|
| |
15
|
P. Reynolds and A. Vahdat. Efficient peer-to-peer keyword searching. In Middleware'03, Rio de Janeiro, Brazil, 2003.
|
| |
16
|
|
| |
17
|
G. Skobeltsyn, M. Hauswirth, and K. Aberer. Efficient processing of XPath queries with structured overlay networks. In ODBASE'05, Agia Napa, Cyprus, 2005.
|
| |
18
|
T. Suel, C. Mathur, J.-W. Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, and K. Shanmugasundaram. ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval. In WebDB'03, San Diego, California, 2003.
|
| |
19
|
C. Tang and S. Dwarkadas. Hybrid global-local indexing for efficient peer-to-peer information retrieval. In NSDI'04, San Francisco, CA, USA, 2004.
|
| |
20
|
C. Tryfonopoulos, S. Idreos, and M. Koubarakis. Publish/subscribe functionalities for future digital libraries using structured overlay networks. In DELOS'05, Schloss Dagstuhl, Germany, 2005.
|
| |
21
|
|
| |
22
|
|
| |
23
|
M. R. Yong Yang, Rocky Dunlap and B. F. Cooper. Performance of full text search in structured and unstructured peer-to-peer systems. In INFOCOM'06, Barcelona, Spain, 2006.
|
| |
24
|
|
CITED BY 6
|
|
|
|
|
Gleb Skobeltsyn , Toan Luu , Karl Aberer , Martin Rajman , Ivana Podnar Zarko, Query-driven indexing for peer-to-peer text retrieval, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Gleb Skobeltsyn , Toan Luu , Ivana Podnar Zarko , Martin Rajman , Karl Aberer, Web text retrieval with a P2P query-driven index, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
Gleb Skobeltsyn , Toan Luu , Ivana Podnar Žarko , Martin Rajman , Karl Aberer, Query-driven indexing for scalable peer-to-peer text retrieval, Proceedings of the 2nd international conference on Scalable information systems, June 06-08, 2007, Suzhou, China
|
|
|
Gleb Skobeltsyn , Toan Luu , Ivana Podnar arko , Martin Rajman , Karl Aberer, Query-driven indexing for scalable peer-to-peer text retrieval, Future Generation Computer Systems, v.25 n.1, p.89-99, January, 2009
|
|