ACM Home Page
Please provide us with feedback. Feedback
The case for a wide-table approach to manage sparse relational data sets
Full text PdfPdf (457 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data table of contents
Beijing, China
SESSION: Indexing table of contents
Pages: 821 - 832  
Year of Publication: 2007
ISBN:978-1-59593-686-8
Authors
Eric Chu  University of Wisconsin-Madison, Madison, WI
Jennifer Beckmann  Microsoft Corporation, Redmond, WA
Jeffrey Naughton  University of Wisconsin-Madison, Madison, WI
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 105,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247480.1247571
What is a DOI?

ABSTRACT

A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design,storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities that go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Abadi. Redefining Physical Data Independence. To appear in CIDR 2007.
 
2
E. Agichtein, L. Gravano: Querying Text Databases for Efficient Information Extraction. ICDE 2003: 113--124.
 
3
 
4
R. Agrawal, R. Srikant. Searching with Numbers. WWW 2002.
 
5
R. Baylis. Oracle Database Administrator's Guide, 10g, 2003.
 
6
J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In Proc. of ICDE, 2006.
7
 
8
S. Chaudhuri, V. Narasayya. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997.
 
9
CLUstering TOolkit (CLUTO). WWW, available at: http://www.cs.umn.edu/karypis/cluto.
 
10
CNET Networks, Inc. Product Directory. http://shoppper.cnet.com.
 
11
 
12
D. Florescu, D. Kossmann, I. Manolescu, "Integrating Keyword Search into XML Query Processing", WWW Conf., 2000.
 
13
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In Proc. of VLDB, 2002.
 
14
Y. Li, C. Yu, H. Jagadish. Schema-Free XQuery. In VLDB, 2004.
15
16
 
17
 
18
 
19
R. Raman, M. Livny, and M. H. Solomon. Matchmaking: Distributed resource management for high throughput computing. In HPDC, 1998.
 
20
M. Stonebraker et al. C-Store: a Column-Oriented DBMS. In VLDB 2005.
21


Collaborative Colleagues:
Eric Chu: colleagues
Jennifer Beckmann: colleagues
Jeffrey Naughton: colleagues