ACM Home Page
Please provide us with feedback. Feedback
Optimized union of non-disjoint distributed data sets
Full text PdfPdf (520 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: System architectures table of contents
Pages 12-23  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Itay Dar  Tel Aviv University
Tova Milo  Tel Aviv University
Elad Verbin  Tel Aviv University
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 78,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516364
What is a DOI?

ABSTRACT

In a variety of applications, ranging from data integration to distributed query evaluation, there is a need to obtain sets of data items from several sources (peers) and compute their union. As these sets often contain common data items, avoiding the transmission of redundant information is essential for effective union computation. In this paper we define the notion of optimal union plans for non-disjoint data sets residing on distinct peers, and present efficient algorithms for computing and executing such optimal plans.

Our algorithms avoid redundant data transmission and optimally exploit the network bandwidth capabilities. A challenge in the design of optimal plans is the lack of a complete map of the distribution of the data items among peers. We analyze the information required for optimal planning and propose novel techniques to obtain compact, cheap to communicate, description of the data sources. We then exploit it for efficient union computation with reasonable accuracy. We demonstrate experimentally the superiority of our approach over the common naive union computation, showing it improves the performance by an order of magnitude.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ADSL - Asymmetric Digital Subscriber Line. http://en.wikipedia.org/wiki/Adsl.
 
2
S. Abiteboul, I. Manolescu, N. Polyzotis, N. Preda, and C. Sun. Xml processing in dht networks. In ICDE '08, 2008.
 
3
4
5
 
6
A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2004.
7
 
8
J. Byers, J. Considine, and M. Mitzenmacher. Fast approximate reconciliation of set differences. Technical report, 2002. Boston University Computer Science Tech. Rep. 2002--019.
9
10
 
11
B. Cohen. Incentives build robustness in bittorrent. In Proc. Workshop on the Economics of Peer-to-Peer Systems, 2003.
12
13
 
14
15
 
16
17
 
18
 
19
 
20
 
21
22
 
23
Y. Minsky and A. Trachtenberg. Practical set reconciliation. Technical report, 2002. Dept. Elec. Comput. Eng., Boston Univ., Boston, MA, Tech. Rep. BU-ECE-2002-01.
 
24
Y. Minsky, A. Trachtenberg, and R. Zippel. Set reconciliation with nearly optimal communication complexity. In Proc. International Symposium on Information Theory, 2001.
 
25
 
26
A. Sedeno-Noda, D. Alcaide, and C. Gonzalez-Martin. Network flow approaches to pre-emptive open-shop scheduling problems with time-windows. European Journal of Operational Research, 174(3):1501--1518, 2006.
 
27
G. Sivek, S. Sivek, J. Wolfe, and M. Zhivich. Webtorrent: a bittorrent extension for high availability servers. http://pdos.csail.mit.edu/6.824-2004/reports/jwolfe.pdf.
28
 
29
 
30
31
 
32
R. Xu and I. Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645--678, 2005.
33
Collaborative Colleagues:
Itay Dar: colleagues
Tova Milo: colleagues
Elad Verbin: colleagues