ACM Home Page
Please provide us with feedback. Feedback
Horizontal aggregations for building tabular data sets
Full text PdfPdf (183 KB)
Source Data Mining And Knowledge Discovery archive
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery table of contents
Paris, France
SESSION: Full papers table of contents
Pages: 35 - 42  
Year of Publication: 2004
ISBN:1-58113-908-X
Author
Carlos Ordonez  Teradata, NCR, San Diego, CA
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 49,   Citation Count: 2
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008694.1008700
What is a DOI?

ABSTRACT

In a data mining project, a significant portion of time is devoted to building a data set suitable for analysis. In a relational database environment, building such data set usually requires joining tables and aggregating columns with SQL queries. Existing SQL aggregations are limited since they return a single number per aggregated group, producing one row for each computed number. These aggregations help, but a significant effort is still required to build data sets suitable for data mining purposes, where a tabular format is generally required. This work proposes very simple, yet powerful, extensions to SQL aggregate functions to produce aggregations in tabular form, returning a set of numbers instead of one number per row. We call this new class of functions horizontal aggregations. Horizontal aggregations help building answer sets in tabular form (e.g. point-dimension, observation-variable, instance-feature), which is the standard form needed by most data mining algorithms. Two common data preparation tasks are explained, including transposition/aggregation and transforming categorical attributes into binary dimensions. We propose two strategies to evaluate horizontal aggregations using standard SQL. The first strategy is based only on relational operators and the second one uses the "case" construct. Experiments with large data sets study the proposed query optimization strategies.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
6
 
7
U. Fayyad and G. Piateski-Shapiro. From Data Mining to Knowledge Discovery. MIT Press, 1995.
8
 
9
G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In ACM KDD Conference, pages 204--208, 1998.
 
10
 
11
A. Hinneburg, D. Habich, and W. Lehner. Combi-operator-database support for data mining applications. In VLDB Conference, pages 429--439, 2003.
12
13
14
15
 
16
17
18
 
19
20
21
 
22
H. Wang, C. Zaniolo, and C. R. Luo. ATLAS: A small but complete SQL extension for data mining and data streams. In VLDB Conference, pages 1113--1116, 2003.
23
24