| Efficient join processing over uncertain data |
| Full text |
Pdf
(322 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the 15th ACM international conference on Information and knowledge management
table of contents
Arlington, Virginia, USA
SESSION: Join processing and indexing
table of contents
Pages: 738 - 747
Year of Publication: 2006
ISBN:1-59593-433-2
|
|
Authors
|
|
Reynold Cheng
|
Hong Kong Polytechnic University, Hung Hom, Hong Kong
|
|
Sarvjeet Singh
|
Purdue University, West Lafayette, Indiana
|
|
Sunil Prabhakar
|
Purdue University, West Lafayette, Indiana
|
|
Rahul Shah
|
Purdue University, West Lafayette, Indiana
|
|
Jeffrey Scott Vitter
|
Purdue University, West Lafayette, Indiana
|
|
Yuni Xia
|
Indiana University - Purdue University Indianapolis, Indianapolis, Indiana
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 76, Citation Count: 5
|
|
|
ABSTRACT
In many applications data values are inherently uncertain. This includes moving-objects, sensors and biological databases. There has been recent interest in the development of database management systems that can handle uncertain data. Some proposals for such systems include attribute values that are uncertain. In particular, an attribute value can be modeled as a range of possible values, associated with a probability density function. Previous efforts for this type of data have only addressed simple queries such as range and nearest-neighbor queries. Queries that join multiple relations have not been addressed in earlier work despite the significance of joins in databases. In this paper we address join queries over uncertain data. We propose a semantics for the join operation, define probabilistic operators over uncertain data, and propose join algorithms that provide efficient execution of probabilistic joins. The paper focuses on an important class of joins termed probabilistic threshold joins that avoid some of the semantic complexities of dealing with uncertain data. For this class of joins we develop three sets of optimization techniques: item-level, page-level, and index-level pruning. These techniques facilitate pruning with little space and time overhead, and are easily adapted to most join algorithms. We verify the performance of these techniques experimentally.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proc. VLDB, 2004.
|
| |
3
|
R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter. Efficient join processing over uncertain data. Technical Report CSD TR#05-004, Dept. of CS, Purdue University, 2005.
|
| |
4
|
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In Proc. VLDB, 2004.
|
| |
5
|
A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In Proc. VLDB, 2004.
|
| |
6
|
D. Pfoser and C. Jensen. Capturing the uncertainty of moving-objects representations. In Proc. SSDBM, 1999.
|
 |
7
|
|
| |
8
|
|
| |
9
|
E. Hung, L. Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, 2003.
|
| |
10
|
The Lowell Database Research Self-Assessment Meeting. Lowell Massachusetts. May 2003.
|
| |
11
|
|
| |
12
|
A. Nierman and H. V. Jagadish. ProTDB: Probabilistic Data in XML. In VLDB, 2002.
|
| |
13
|
|
| |
14
|
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In Proc. CIDR, 2005.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
CITED BY 5
|
|
Sarvjeet Singh , Chris Mayfield , Sagar Mittal , Sunil Prabhakar , Susanne Hambrusch , Rahul Shah, Orion 2.0: native support for uncertain data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
Wan D. Bae , Petr Vojtěchovský , Shayma Alkobaisi , Scott T. Leutenegger , Seon Ho Kim, An interactive framework for raster data spatial joins, Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems, November 07-09, 2007, Seattle, Washington
|
|
|
Ravi Jampani , Fei Xu , Mingxi Wu , Luis Leopoldo Perez , Christopher Jermaine , Peter J. Haas, MCDB: a monte carlo approach to managing uncertain data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|