|
ABSTRACT
We consider in this paper a class of Publish-Subscribe (pub-sub) systems called topic-based systems, where users subscribe to topics and are notified on events that belong to those subscribed topics. With the recent flourishing of RSS news syndication, these systems are regaining popularity and are raising new challenging problems. In most of the modern topics-based systems, the events in each topic are delivered to the subscribers via a supporting, distributed, data structure (typically a multicast tree). Since peers in the network may come and go frequently, this supporting structure must be continuously maintained so that "holes" do not disrupt the events delivery. The dissemination of events in each topic thus incurs two main costs: (1) the actual transmission cost for the topic events,and (2) the maintenance cost for its supporting structure. This maintenance overhead becomes particularly dominating when a pub-sub system supports a large number of topics with moderate event frequency; a typical scenario in nowadays news syndication scene. The goal of this paper is to devise a method for reducing this maintenance overhead to the minimum. Our aim is not to invent yet another topic-based pub-sub system, but rather to develop a generic technique for better utilization of existing platforms. Our solution is based on a novel distributed clustering algorithm that utilizes correlations between user subscriptions to dynamically group topics together, into virtual topics (called topic-clusters), andt hereby unifies their supporting structures and reduces costs. Our technique continuously adapts the topic-clusters and the user subscriptions to the system state, and incurs only very minimal overhead. We have implemented our solution in the Tamara pub-sub system. Our experimental study shows this approach to be extremely effective, improving the performance by an order of magnitude.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. E. Bartlett, J. W. Kotrlik, and C. C. Higgins. Organizational research: Determining appropriate sample size in survey research. Information Technology, Learning, and Performance, 19(1), 2001.
|
| |
2
|
F. Cao and J. P. Singh. Efficient event routing in content-based publish-subscribe service networks. IEEE INFOCOM, 2004.
|
 |
3
|
|
| |
4
|
M. Castro, P. Druschel, A. Kermarrec, and A. Rowstron. SCRIBE: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in communications (JSAC), 20(8):1489--1499, 2002.
|
 |
5
|
|
| |
6
|
|
| |
7
|
Y. Diao, P. Fischer, M. J. Franklin, and R. To. Yfilter: Efficient and scalable filtering of xml documents. In Proc. Intl. Conf. on Data Engineering (ICDE), pages 341--342, 2002.
|
| |
8
|
Y. Diao, S. Rizvi, and M. J. Franklin. Towards an internet-scale xml dissemination service. In Proc. VLDB, pages 612--623, 2004.
|
| |
9
|
Freshmeat. The largest index of unix and cross-platform softwares. http://freshmeat.net/projects/gimp/.
|
| |
10
|
Y. Gourhant, S. Louboutin, V. Cahill, A. Condon, G. Starovic, and B. Tangney. Dynamic clustering in an object-oriented distributed system. In Proc. Objects in Large Distributed Applications OLDA-II, Ottawa, Canada, 1992.
|
| |
11
|
E. Januzaj, H. Kriegel, and M. Pfeifle. Towards effective and efficient distributed clustering. In Workshop on Clustering Large Data Sets (ICDM2003), Melbourne, FL, 2003.
|
| |
12
|
|
| |
13
|
Tapas Kanungo , David M. Mount , Nathan S. Netanyahu , Christine D. Piatko , Ruth Silverman , Angela Y. Wu, An Efficient k-Means Clustering Algorithm: Analysis and Implementation, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.24 n.7, p.881-892, July 2002
[doi> 10.1109/TPAMI.2002.1017616]
|
| |
14
|
mandriva.com. Mandrive linux distribution. http://www.mandriva.com/.
|
| |
15
|
S. Moritz and B. Ernst W. DDC: A dynamic and distributed clustering algorithm for networked virtual environments based on p2p networks. In Proceedings of CoNEXT'05, France, 2005.
|
| |
16
|
nielsen-netratings.com. Nielsen/NetRating. http://www.nielsen-netratings.com/pr/pr-050920.
|
 |
17
|
|
| |
18
|
V. Ramasubramanian, R. Peterson, and Emin Gun Sirer. Corona: A high performance publish-subscribe system for the world wide web. In Proc. of Networked System Design and Implementation, San Jose, California, 2006.
|
| |
19
|
|
| |
20
|
|
| |
21
|
A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of IFIP/ACM Middleware, Germany, 2001.
|
| |
22
|
D. Sandler, A. Mislove, A. Post, and P. Druschel. FeedTree: Sharing web micronews with peer-to-peer event notification. In Proc. Int. Workshop on Peer-to-Peer Systems (IPTPS'05), New York, 2005.
|
| |
23
|
T. Sivaharan, G. Blair, and G. Coulson. GREEN: A configurable and re-configurable publish-subscribe middleware for pervasive computing. In Proc, Int. Symposium on Distributed Objects and Applications (DOA05), Agia Napa, Cyprus, 2005.
|
 |
24
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
25
|
GIMP.GNU image manipulation program. http://www.gimp.org/.
|
| |
26
|
E. Patrick Th., G. Rachid, and S. Joe. Type-Based Publish/Subscribe. Technical report, 2000.
|
| |
27
|
S. Voulgaris, E. Riviere, A. Kermarrec, and M. van Steen. Sub-2-sub: Self-organizing content-based publish and subscribe for dynamic and large scale collaborative networks. In Proc. Int'l Workshop on Peer-to-Peer Systems (IPTPS), California, USA, February 2006.
|
| |
28
|
|
| |
29
|
B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, UC Berkeley, April 2001.
|
 |
30
|
Shelley Q. Zhuang , Ben Y. Zhao , Anthony D. Joseph , Randy H. Katz , John D. Kubiatowicz, Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination, Proceedings of the 11th international workshop on Network and operating systems support for digital audio and video, p.11-20, January 2001, Port Jefferson, New York, United States
[doi> 10.1145/378344.378347]
|
CITED BY 5
|
|
Serge Abiteboul , Itay Dar , Radu Pop , Gabriel Vasile , Dan Vodislav , Nicoleta Preda, Large scale P2P distribution of open-source software, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|