|
ABSTRACT
In a variety of applications, ranging from data integration to distributed query evaluation, there is a need to obtain sets of data items from several sources (peers) and compute their union. As these sets often contain common data items, avoiding the transmission of redundant information is essential for effective union computation. In this paper we define the notion of optimal union plans for non-disjoint data sets residing on distinct peers, and present efficient algorithms for computing and executing such optimal plans. Our algorithms avoid redundant data transmission and optimally exploit the network bandwidth capabilities. A challenge in the design of optimal plans is the lack of a complete map of the distribution of the data items among peers. We analyze the information required for optimal planning and propose novel techniques to obtain compact, cheap to communicate, description of the data sources. We then exploit it for efficient union computation with reasonable accuracy. We demonstrate experimentally the superiority of our approach over the common naive union computation, showing it improves the performance by an order of magnitude.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ADSL - Asymmetric Digital Subscriber Line. http://en.wikipedia.org/wiki/Adsl.
|
| |
2
|
S. Abiteboul, I. Manolescu, N. Polyzotis, N. Preda, and C. Sun. Xml processing in dht networks. In ICDE '08, 2008.
|
| |
3
|
Ravindra K. Ahuja , Thomas L. Magnanti , James B. Orlin, Network flows: theory, algorithms, and applications, Prentice-Hall, Inc., Upper Saddle River, NJ, 1993
|
 |
4
|
Philip A. Bernstein , Nathan Goodman , Eugene Wong , Christopher L. Reeve , James B. Rothnie, Jr., Query processing in a system for distributed databases (SDD-1), ACM Transactions on Database Systems (TODS), v.6 n.4, p.602-625, Dec. 1981
[doi> 10.1145/319628.319650]
|
 |
5
|
|
| |
6
|
A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2004.
|
 |
7
|
Andrei Z. Broder , Moses Charikar , Alan M. Frieze , Michael Mitzenmacher, Min-wise independent permutations (extended abstract), Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.327-336, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276781]
|
| |
8
|
J. Byers, J. Considine, and M. Mitzenmacher. Fast approximate reconciliation of set differences. Technical report, 2002. Boston University Computer Science Tech. Rep. 2002--019.
|
 |
9
|
John Byers , Jeffrey Considine , Michael Mitzenmacher , Stanislav Rost, Informed content delivery across adaptive overlay networks, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
 |
10
|
Zhiyuan Chen , Nick Koudas , Flip Korn , S. Muthukrishnan, Selectively estimation for Boolean queries, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.216-225, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335225]
|
| |
11
|
B. Cohen. Incentives build robustness in bittorrent. In Proc. Workshop on the Economics of Peer-to-Peer Systems, 2003.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
Y. Minsky and A. Trachtenberg. Practical set reconciliation. Technical report, 2002. Dept. Elec. Comput. Eng., Boston Univ., Boston, MA, Tech. Rep. BU-ECE-2002-01.
|
| |
24
|
Y. Minsky, A. Trachtenberg, and R. Zippel. Set reconciliation with nearly optimal communication complexity. In Proc. International Symposium on Information Theory, 2001.
|
| |
25
|
|
| |
26
|
A. Sedeno-Noda, D. Alcaide, and C. Gonzalez-Martin. Network flow approaches to pre-emptive open-shop scheduling problems with time-windows. European Journal of Operational Research, 174(3):1501--1518, 2006.
|
| |
27
|
G. Sivek, S. Sivek, J. Wolfe, and M. Zhivich. Webtorrent: a bittorrent extension for high availability servers. http://pdos.csail.mit.edu/6.824-2004/reports/jwolfe.pdf.
|
 |
28
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
R. Xu and I. Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645--678, 2005.
|
 |
33
|
|
|