| The case for a wide-table approach to manage sparse relational data sets |
| Full text |
Pdf
(457 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
table of contents
Beijing, China
SESSION: Indexing
table of contents
Pages: 821 - 832
Year of Publication: 2007
ISBN:978-1-59593-686-8
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 105, Citation Count: 4
|
|
|
ABSTRACT
A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design,storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities that go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Abadi. Redefining Physical Data Independence. To appear in CIDR 2007.
|
| |
2
|
E. Agichtein, L. Gravano: Querying Text Databases for Efficient Information Extraction. ICDE 2003: 113--124.
|
| |
3
|
|
| |
4
|
R. Agrawal, R. Srikant. Searching with Numbers. WWW 2002.
|
| |
5
|
R. Baylis. Oracle Database Administrator's Guide, 10g, 2003.
|
| |
6
|
J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In Proc. of ICDE, 2006.
|
 |
7
|
|
| |
8
|
S. Chaudhuri, V. Narasayya. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997.
|
| |
9
|
CLUstering TOolkit (CLUTO). WWW, available at: http://www.cs.umn.edu/karypis/cluto.
|
| |
10
|
CNET Networks, Inc. Product Directory. http://shoppper.cnet.com.
|
| |
11
|
|
| |
12
|
D. Florescu, D. Kossmann, I. Manolescu, "Integrating Keyword Search into XML Query Processing", WWW Conf., 2000.
|
| |
13
|
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In Proc. of VLDB, 2002.
|
| |
14
|
Y. Li, C. Yu, H. Jagadish. Schema-Free XQuery. In VLDB, 2004.
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
R. Raman, M. Livny, and M. H. Solomon. Matchmaking: Distributed resource management for high throughput computing. In HPDC, 1998.
|
| |
20
|
M. Stonebraker et al. C-Store: a Column-Oriented DBMS. In VLDB 2005.
|
 |
21
|
|
CITED BY 4
|
|
Eric Chu , Akanksha Baid , Ting Chen , AnHai Doan , Jeffrey Naughton, A relational approach to incrementally extracting and querying structure in unstructured data, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
Stefan Aulbach , Dean Jacobs , Alfons Kemper , Michael Seibold, A comparison of flexible schemas for software as a service, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|