|
ABSTRACT
Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks --- tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly into applications. The traditional approach to dealing with this problem is to first process the data using statistical or probabilistic models that can provide more robust interpretations of the data. Current database systems, however, do not provide adequate support for applying models to such data, especially when those models need to be frequently updated as new data arrives in the system. Hence, most scientists and engineers who depend on models for managing their data do not use database systems for archival or querying at all; at best, databases serve as a persistent raw data store.In this paper we define a new abstraction called model-based views and present the architecture of MauveDB, the system we are building to support such views. Just as traditional database views provide logical data independence, model-based views provide independence from the details of the underlying data generating mechanism and hide the irregularities of the data by using models to present a consistent view to the users. MauveDB supports a declarative language for defining model-based views, allows declarative querying over such views using SQL, and supports several different materialization strategies and techniques to efficiently maintain them in the face of frequent updates. We have implemented a prototype system that currently supports views based on regression and interpolation, using the Apache Derby open source DBMS, and we present results that show the utility and performance benefits that can be obtained by supporting several different types of model-based views in a database system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
The Apache Derby Project. Web Site. http://db.apache.org/derby/.
|
| |
4
|
|
 |
5
|
|
 |
6
|
Alberto Cerpa , Jeremy Elson , Michael Hamilton , Jerry Zhao , Deborah Estrin , Lewis Girod, Habitat monitoring: application driver for wireless communications technology, Workshop on Data communication in Latin America and the Caribbean, p.20-41, April 2001, San Jose, Costa Rica
[doi> 10.1145/371626.371720]
|
| |
7
|
|
 |
8
|
|
| |
9
|
M. Chu, H. Haussecker, and F. Zhao. Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks. In Intl Journal of High Performance Computing Applications, 2002.
|
| |
10
|
Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
|
| |
11
|
Dorothy E. Denning , Selim G. Akl , Mark Heckman , Teresa F. Lunt , Matthew Morgenstern , Peter G. Neumann , Roger R. Schell, Views for multilevel database security, IEEE Transactions on Software Engineering, v.13 n.2, p.129-140, Feb. 1987
[doi> 10.1109/TSE.1987.232889]
|
| |
12
|
Amol Deshpande, Carlos Guestrin, Sam Madden, Joe Hellerstein, and Wei Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.
|
 |
13
|
|
| |
14
|
G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins, 1989.
|
 |
15
|
|
| |
16
|
|
 |
17
|
Carlos Guestrin , Peter Bodi , Romain Thibau , Mark Paski , Samuel Madde, Distributed regression: an efficient framework for modeling sensor network data, Proceedings of the third international symposium on Information processing in sensor networks, April 26-27, 2004, Berkeley, California, USA
[doi> 10.1145/984622.984624]
|
| |
18
|
|
| |
19
|
|
| |
20
|
DB2 Intelligent Miner. Web Site. http://www-306.ibm.com/software/data/iminer/.
|
 |
21
|
|
 |
22
|
Chalermek Intanagonwiwat , Ramesh Govindan , Deborah Estrin, Directed diffusion: a scalable and robust communication paradigm for sensor networks, Proceedings of the 6th annual international conference on Mobile computing and networking, p.56-67, August 06-11, 2000, Boston, Massachusetts, United States
[doi> 10.1145/345910.345920]
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
L. Liao, D. Fox, and H. Kautz. Location-based activity recognition using relational markov networks. In IJCAI, 2005.
|
| |
27
|
Sam Madden. Intel lab data, 2004. http://berkeley.intel-research.net/labdata.
|
| |
28
|
Samuel Madden, Wei Hong, Joseph M. Hellerstein, and Michael Franklin. TinyDB web page. http://telegraph.cs.berkeley.edu/tinydb.
|
 |
29
|
Alan Mainwaring , David Culler , Joseph Polastre , Robert Szewczyk , John Anderson, Wireless sensor networks for habitat monitoring, Proceedings of the 1st ACM international workshop on Wireless sensor networks and applications, September 28-28, 2002, Atlanta, Georgia, USA
[doi> 10.1145/570738.570751]
|
| |
30
|
Erin McKean, editor. The Oxford English Dictionary (2nd Edition). Oxford Univeristy Press, 2005.
|
 |
31
|
|
| |
32
|
George M. Phillips. Interpolation and Approximation by Polynomials. Springer-Verlag, 2003.
|
| |
33
|
PMML 3.0 Specification. Web Site. http://www.dmg.org/v3-0/GeneralStructure.html.
|
 |
34
|
Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal, Integrating association rule mining with relational database systems: alternatives and implications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.343-354, June 01-04, 1998, Seattle, Washington, United States
|
| |
35
|
Business Analytics Software Solutions (SAS). Web Site. http://www.sas.com/technologies/analytics.
|
| |
36
|
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
|
 |
37
|
Yuni Xia , Sunil Prabhakar , Shan Lei , Reynold Cheng , Rahul Shah, Indexing continuously changing data with mean-variance tree, Proceedings of the 2005 ACM symposium on Applied computing, March 13-17, 2005, Santa Fe, New Mexico
[doi> 10.1145/1066677.1066932]
|
| |
38
|
Y. Yao and J. Gehrke. Query processing in sensor networks. In CIDR, 2003.
|
CITED BY 17
|
|
Evan Welbourne , Nodira Khoussainova , Julie Letchner , Yang Li , Magdalena Balazinska , Gaetano Borriello , Dan Suciu, Cascadia: A System for Specifying, Detecting, and Managing RFID Events, Proceeding of the 6th international conference on Mobile systems, applications, and services, June 17-20, 2008, Breckenridge, CO, USA
|
|
|
Evan Welbourne , Karl Koscher , Emad Soroush , Magdalena Balazinska , Gaetano Borriello, Longitudinal study of a building-scale RFID ecosystem, Proceedings of the 7th international conference on Mobile systems, applications, and services, June 22-25, 2009, Kraków, Poland
|
|
|
Tarek Abdelzaher , Yaw Anokwa , Peter Boda , Jeff Burke , Deborah Estrin , Leonidas Guibas , Aman Kansal , Samuel Madden , Jim Reich, Mobiscopes for Human Spaces, IEEE Pervasive Computing, v.6 n.2, p.20-29, April 2007
|
|
|
Mengmeng Liu , Svilen R. Mihaylov , Zhuowei Bao , Marie Jacob , Zachary G. Ives , Boon Thau Loo , Sudipto Guha, SmartCIS: integrating digital and physical environments, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
Adam Silberstein , Gavino Puggioni , Alan Gelfand , Kamesh Munagala , Jun Yang, Suppression and failures in sensor networks: a Bayesian approach, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
Ravi Jampani , Fei Xu , Mingxi Wu , Luis Leopoldo Perez , Christopher Jermaine , Peter J. Haas, MCDB: a monte carlo approach to managing uncertain data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Magdalena Balazinska , Amol Deshpande , Michael J. Franklin , Phillip B. Gibbons , Jim Gray , Mark Hansen , Michael Liebhold , Suman Nath , Alexander Szalay , Vincent Tao, Data Management in the Worldwide Sensor Web, IEEE Pervasive Computing, v.6 n.2, p.30-40, April 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|