ACM Home Page
Please provide us with feedback. Feedback
Efficient computation of PCA with SVD in SQL
Full text PdfPdf (201 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors table of contents
Paris, France
Article No. 5  
Year of Publication: 2009
ISBN:978-1-60558-673-1
Authors
Mario Navas  University of Houston, Houston, TX
Carlos Ordonez  University of Houston, Houston, TX
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 25,   Downloads (12 Months): 64,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1581114.1581119
What is a DOI?

ABSTRACT

PCA is one of the most common dimensionality reduction techniques with broad applications in data mining, statistics and signal processing. In this work we study how to leverage a DBMS computing capabilities to solve PCA. We propose a solution that combines a summarization of the data set with the correlation or covariance matrix and then solve PCA with Singular Value Decomposition (SVD). Deriving the summary matrices allow analyzing large data sets since they can be computed in a single pass. Solving SVD without external libraries proves to be a challenge to compute in SQL. We introduce two solutions: one based in SQL queries and a second one based on User-Defined Functions. Experimental evaluation shows our method can solve larger problems in less time than external statistical packages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
5
 
6
J. J. Gerbrands. On the relationships between svd, klt and pca. Pattern Recognition, 14(1--6):375--381, 1981.
 
7
 
8
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, New York, 1st edition, 2001.
 
9
 
10
11
12
13
14
15
16
17
18
 
19
20
 
21
 
22
 
23

Collaborative Colleagues:
Mario Navas: colleagues
Carlos Ordonez: colleagues