|
ABSTRACT
Content-based full-text search is a challenging problem in Peer-to-Peer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned.In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSearch distributes document indices through the P2P network based on document semantics generated by Latent Semantic Indexing (LSI). The search cost (in terms of different nodes searched and data transmitted) for a given query is thereby reduced, since the indices of semantically related documents are likely to be co located in the network.We also describe techniques that help distribute the indices more evenly across the nodes, and further reduce the number of nodes accessed using appropriate index distribution as well as using index samples and recently processed queries to guide the search.Experiments show that pSearch can achieve performance comparable to centralized information retrieval systems by searching only a small number of nodes. For a system with 128,000 nodes and 528,543 documents (from news, magazines, etc.), pSearch searches only 19 nodes and transmits only 95.5KB data during the search, whereas the top 15 documents returned by pSearch and LSI have a 91.7% intersection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
E. Cohen, A. Fiat, and H. Kaplan. Associative Search in Peer to Peer Networks: Harnessing Latent Semantics. In IEEE INFOCOM'03, April 2003.
|
 |
4
|
Edith Cohen , Scott Shenker, Replication strategies in unstructured peer-to-peer networks, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
| |
5
|
|
| |
6
|
|
| |
7
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
| |
8
|
S. Dumais. Using LSI for information filtering: TREC-3 experiments. In the Third Text REtrieval Conference (TREC3), 1995.
|
| |
9
|
C. Faloutsos , R. Barber , M. Flickner , J. Hafner , W. Niblack , D. Petkovic , W. Equitz, Efficient and effective querying by image content, Journal of Intelligent Information Systems, v.3 n.3-4, p.231-262, July 1994
[doi> 10.1007/BF00962238]
|
| |
10
|
FastTrack. http://www.fasttrack.nu.
|
 |
11
|
|
 |
12
|
|
| |
13
|
R. Lempel and S. Moran. Optimizing Result Prefetching in Web Search Engines with Segmented Indices. In VLDB'01, 2001.
|
| |
14
|
|
| |
15
|
J. Li, B. T. Loo, J. Hellerstein, F. Kaashoek, D. R. Karger, and R. Morris. On the Feasibility of Peer-to-Peer Web Indexing and Search. In IPTPS'03, February 2003.
|
 |
16
|
Qin Lv , Pei Cao , Edith Cohen , Kai Li , Scott Shenker, Search and replication in unstructured peer-to-peer networks, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
[doi> 10.1145/514191.514206]
|
| |
17
|
C. D. Prete, J. T. McArthur, R. L. Villars, I. L. Nathan Redmond, and D. Reinsel. Industry developments and models, Disruptive Innovation in Enterprise Computing: storage. IDC, February 2003.
|
| |
18
|
|
 |
19
|
Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.161-172, August 2001, San Diego, California, United States
|
| |
20
|
S. Rhea and J. Kubiatowicz. Probabilistic Location and Routing. In IEEE INFOCOM'02, June 2002.
|
| |
21
|
M. Schwartz. A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on Probabilistic Protocols. Technical Report CU-CS-474-90, University of Colorado, 1990.
|
 |
22
|
|
| |
23
|
SVDPACK. http://www.netlib.org/svdpack.
|
| |
24
|
C. Tang, Z. Xu, and M. Mahalingam. pSearch: Information Retrieval in Structured Overlays. In HotNets-I, October 2002. Expanded version available as HP technical report HPL-2002-198, "PeerSearch: Efficient Information Retrieval in Peer- to-Peer Networks".
|
| |
25
|
Text Retrieval Conference (TREC). http://trec.nist.gov.
|
| |
26
|
|
| |
27
|
|
| |
28
|
J. D. Zakis and Z. J. Pudlowski. The World Wide Web as Universal Medium for Scholarly Publication, Information Retrieval and Interchange. Global Journal of Engineering Education, 1(3), 1997.
|
CITED BY 77
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
O. D. Sahin , A. Gulbeden , F. Emekci , D. Agrawal , A. El Abbadi, PRISM: indexing multi-dimensional data in P2P networks using reference vectors, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R. Akavipat , L.-S. Wu , F. Menczer , A.G. Maguitman, Emerging semantic communities in peer web search, Proceedings of the international workshop on Information retrieval in peer-to-peer networks, November 11-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jik-Soo Kim , Peter Keleher , Michael Marsh , Bobby Bhattacharjee , Alan Sussman, Using content-addressable networks for load balancing in desktop grids, Proceedings of the 16th international symposium on High performance distributed computing, June 25-29, 2007, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jik-Soo Kim , Beomseok Nam , Peter Keleher , Michael Marsh , Bobby Bhattacharjee , Alan Sussman, Trade-offs in matching jobs and balancing load for distributed desktop grids, Future Generation Computer Systems, v.24 n.5, p.415-424, May, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elena Meshkova , Janne Riihijärvi , Marina Petrova , Petri Mähönen, A survey on resource discovery mechanisms, peer-to-peer and service discovery frameworks, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.52 n.11, p.2097-2128, August, 2008
|
|
|
|
|
|
Aoying Zhou , Rong Zhang , Weining Qian , Quang Hieu Vu , Tianming Hu, Adaptive indexing for content-based search in P2P systems, Data & Knowledge Engineering, v.67 n.3, p.381-398, December, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jik-soo Kim , Beomseok Nam , Peter Keleher , Michael Marsh , Bobby Bhattacharjee , Alan Sussman, Resource Discovery Techniques in Distributed Desktop Grid Environments, Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, p.9-16, September 28-29, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|