ACM Home Page
Please provide us with feedback. Feedback
Efficient discovery of join plans in schemaless data
Full text PdfPdf (850 KB)
Source
ACM International Conference Proceeding Series archive
Proceedings of the 2009 International Database Engineering & Applications Symposium table of contents
Cetraro - Calabria, Italy
SESSION: Full papers table of contents
Pages 1-11  
Year of Publication: 2009
ISBN:978-1-60558-402-7
Authors
Aybar C. Acar  Bilkent University, Ankara, Turkey
Amihai Motro  George Mason University, Fairfax, VA
Sponsors
: BytePress
Concordia University : Concordia University
: ACM
: Universita della Calabria, Rende(CS), Italy
: ICAR-CNR, Rende (CS), Italy
: ACM International Conference Proceeding Series
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 10,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1620432.1620434
What is a DOI?

ABSTRACT

We describe a method of inferring join plans for a set of relation instances, in the absence of any metadata, such as attribute domains, attribute names, or constraints (e.g., keys or foreign keys). Our method enumerates the possible join plans in order of likelihood, based on the compatibility of a pair of columns and their suitability as join attributes (i.e. their appropriateness as keys). We outline two variants of the approach. The first variant is accurate but potentially time-consuming, especially for large relations that do not fit in memory. The second variant is an approximation of the former and hence less accurate, but is considerably more efficient, allowing the method to be used online, even for large relations. We provide experimental results showing how both forms scale in terms of performance as the number of candidate join attributes and the size of the relations increase. We also characterize the accuracy of the approximate variant with respect to the exact variant.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. C. Acar and A. Motro. Query Consolidation: Interpreting a Set of Independent Queries Using a Multidatabase Architecture in the Reverse Direction. In Proc. of the International Workshop on New Trends in Information Integration, VLDB' 09, pages 4--7, 2008.
 
2
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: Enabling Keyword Search over Relational Databases. In Proc. of ICDE '02, pages 5--16, 2002.
 
3
D. Angluin. A Note on the Number of Queries Needed to Identify Regular Languages. Information and Control, 51(1):76--87, 1981.
 
4
J. Bauckmann, U. Leser, F. Naumann, and V. Tietz. Efficiently Detecting Inclusion Dependencies. In Proc. of ICDE '07, pages 1448--1450, 2007.
 
5
S. Bell and P. Brockhausen. Discovery of Data Dependencies in Relational Databases. Technical report, Universitat Dortmund, 1995.
 
6
J. Berlin and A. Motro. Database Schema Matching Using Machine Learning with Feature Selection. In Proc. of CAiSE 02, pages 452--466, 2002.
 
7
P. G. Brown and P. J. Haas. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In Proc. of VLDB '03, pages 668--679, 2003.
 
8
P. Buneman. Semistructured Data. In Proc of PODS '97, pages 117--121, 1997.
 
9
S. Castano and V. De Antonellis. A Schema Analysis and Reconciliation Tool Environment. In Proc. of IDEAS '99, pages 53--62, 1999.
 
10
T. Catarci. What Happened When Database Researchers Met Usability. Information Systems, 25(3):177--212, May 2000.
 
11
A. Doan, P. Domingos, and A. Halevy. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach. In Proc. of SIGMOD '01, pages 509--520, 2001.
 
12
D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen and R. S. Sharma. Discovering All Most Specific Sentences. ACM Transactions on Database Systems. 28(2):140--174, 2003.
 
13
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. In Proc. of VLDB '02, pages 670--681, 2002.
 
14
I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In Proc. of SIGMOD '04, pages 647--658, 2004.
 
15
S. Kapoor and H. Ramesh. Algorithms for Enumerating All Spanning Trees of an Undirected Graph. SIAM Journal on Computing, 24(2):247--265, 1995.
 
16
J. Larson, S. Navathe, R. Elmasri, and M. Honeywell. A Theory of Attributed Equivalence in Databases with Application to Schema Integration. IEEE Transactions on Software Engineering, 15(4):449--463, 1989.
 
17
W. Li and C. Clifton. SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases using Neural Networks. Data & Knowledge Engineering, 33(1):49--84, 2000.
 
18
D. Maier and J. D. Ullman. Maximal Objects and the Semantics of Universal Relation Databases. ACM Transactions on Database Systems, 8(1):1--14, 1983.
 
19
D. Maier, J. D. Ullman, and M. Y. Vardi. On the Foundations of the Universal Relation Model. ACM Transactions on Database Systems, 9(2):283--308, 1984.
 
20
T. Mason and R. Lawrence. INFER: A Relational Query Language without the Complexity of SQL. In Proceedings of CIKM '05, pages 241--242, 2005.
 
21
G. A. Miller. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review, 63(2):81--97, 1956.
 
22
A. Motro. FLEX: A Tolerant and Cooperative User Interface to Databases. IEEE Transactions on Knowledge and Data Engineering, 2(2):231--246, 1990.
 
23
R. Parekh and V. Honavar. An Incremental Interactive Algorithm for Regular Grammar Inference. In Proc. of ICGI 96, 1996.
 
24
D. Pinto, A. McCallum, X. Wei, and W. Croft. Table Extraction using Conditional Random Fields. In Proceedings of SIGIR '03, pages 235--242, 2003.
 
25
E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal, 10(4):334--350, 2001.
 
26
Transaction Processing Performance Council. TPC Benchmark H Rev 2.6.1. Standard Specification, 2007.
 
27
J. A. Wald and P. G. Sorenson. Resolving the Query Inference Problem using Steiner Trees. ACM Transactions on Database Systems, 9(3):348--368, 1984.