| Querying continuous functions in a database system |
| Full text |
Pdf
(354 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
table of contents
Vancouver, Canada
SESSION: Research Session 17: Probabilistic II
table of contents
Pages 791-804
Year of Publication: 2008
ISBN:978-1-60558-102-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 20, Downloads (12 Months): 178, Citation Count: 0
|
|
|
ABSTRACT
Many scientific, financial, data mining and sensor network applications need to work with continuous, rather than discrete data e.g., temperature as a function of location, or stock prices or vehicle trajectories as a function of time. Querying raw or discrete data is unsatisfactory for these applications -- e.g., in a sensor network, it is necessary to interpolate sensor readings to predict values at locations where sensors are not deployed. In other situations, raw data can be inaccurate owing to measurement errors, and it is useful to fit continuous functions to raw data and query the functions, rather than raw data itself -- e.g., fitting a smooth curve to noisy sensor readings, or a smooth trajectory to GPS data containing gaps or outliers. Existing databases do not support storing or querying continuous functions, short of brute-force discretization of functions into a collection of tuples. We present FunctionDB, a novel database system that treats mathematical functions as first-class citizens that can be queried like traditional relations. The key contribution of FunctionDB is an efficient and accurate algebraic query processor - for the broad class of multi-variable polynomial functions, FunctionDB executes queries directly on the algebraic representation of functions without materializing them into discrete points, using symbolic operations: zero finding, variable substitution, and integration. Even when closed form solutions are intractable, FunctionDB leverages symbolic approximation operations to improve performance. We evaluate FunctionDB on real data sets from a temperature sensor network, and on traffic traces from Boston roads. We show that operating in the functional domain has substantial advantages in terms of accuracy (15-30%) and up to order of magnitude (10x-100x) performance wins over existing approaches that represent models as discrete collections of points.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
PostGIS. http://postgis.refractions.net/.
|
 |
2
|
|
 |
3
|
Alexander Brodsky , Victor E. Segal , Jia Chen , Paval A. Exarkhopoulo, The CCUBE constraint object-oriented database system, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.577-579, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
4
|
|
 |
5
|
Stéphane Grumbach , Philippe Rigaux , Luc Segoufin, The DEDALE system for complex spatial queries, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.213-224, June 01-04, 1998, Seattle, Washington, United States
|
| |
6
|
|
 |
7
|
Ralf Hartmut Güting , Michael H. Böhlen , Martin Erwig , Christian S. Jensen , Nikos A. Lorentzos , Markus Schneider , Michalis Vazirgiannis, A foundation for representing and querying moving objects, ACM Transactions on Database Systems (TODS), v.25 n.1, p.1-42, March 2000
[doi> 10.1145/352958.352963]
|
| |
8
|
|
 |
9
|
Bret Hull , Vladimir Bychkovsky , Yang Zhang , Kevin Chen , Michel Goraczko , Allen Miu , Eugene Shih , Hari Balakrishnan , Samuel Madden, CarTel: a distributed mobile sensor computing system, Proceedings of the 4th international conference on Embedded networked sensor systems, October 31-November 03, 2006, Boulder, Colorado, USA
[doi> 10.1145/1182807.1182821]
|
| |
10
|
|
| |
11
|
|
| |
12
|
R. A. O. L. Breiman, J. H. Friedman and C. J. Stone. Classification And Regression Trees. Wadsworth International Group, 1984.
|
| |
13
|
W. Y. Loh. Regression Trees With Unbiased Variable Selection And Interaction Detection. Statistica Sinica, 12:361--386, 2002.
|
| |
14
|
|
| |
15
|
|
 |
16
|
Peter Revesz , Rui Chen , Pradip Kanjamala , Yiming Li , Yuguo Liu , Yonghui Wang, The MLPQ/GIS constraint database system, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.601, May 15-18, 2000, Dallas, Texas, United States
|
| |
17
|
|
| |
18
|
A. Thiagarajan. Representing and Querying Regression Models in an RDBMS. Master's thesis, MIT, Sep 2007.
|
| |
19
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.4
Systems
Subjects:
Query processing
Additional Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.3
Languages
Subjects:
Query languages
I.
Computing Methodologies
I.1
SYMBOLIC AND ALGEBRAIC MANIPULATION
I.1.4
Applications
General Terms:
Algorithms,
Experimentation,
Languages,
Performance
Keywords:
continuous data,
erroneous data,
functions,
imprecise data,
model based views,
regression,
symbolic query processing,
uncertain data
|