| REDUS: finding reducible subspaces in high dimensional data |
| Full text |
Pdf
(914 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: KM: feature selection
table of contents
Pages 961-970
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
Xiang Zhang
|
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
|
|
Feng Pan
|
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
|
|
Wei Wang
|
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 101, Citation Count: 0
|
|
|
ABSTRACT
Finding latent patterns in high dimensional data is an important research problem with numerous applications. The most well known approaches for high dimensional data analysis are feature selection and dimensionality reduction. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging applications, however, scientists are interested in the local latent patterns held by feature subspaces, which may be invisible via any global transformation. In this paper, we investigate the problem of finding strong linear and nonlinear correlations hidden in feature subspaces of high dimensional data. We formalize this problem as identifying reducible subspaces in the full dimensional space. Intuitively, a reducible subspace is a feature subspace whose intrinsic dimensionality is smaller than the number of features. We present an effective algorithm, REDUS, for finding the reducible subspaces. Two key components of our algorithm are finding the overall reducible subspace, and uncovering the individual reducible subspaces from the overall reducible subspace. A broad experimental evaluation demonstrates the effectiveness of our algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A. Alizadeh and et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503--11, 2000.
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
I. Borg and P. Groenen. Modern multidimensional scaling. New York: Springer, 1997.
|
| |
9
|
|
| |
10
|
|
| |
11
|
M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95:14863--68, 1998.
|
 |
12
|
|
| |
13
|
K. Fukunaga. Intrinsic dimensionality extraction. Classification, Pattern recongnition and Reduction of Dimensionality, Volume 2 of Handbook of Statistics, pages 347--360, P. R. Krishnaiah and L. N. Kanal eds., Amsterdam, North Holland, 1982.
|
| |
14
|
|
 |
15
|
Aristides Gionis , Alexander Hinneburg , Spiros Papadimitriou , Panayiotis Tsaparas, Dimension induced clustering, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081880]
|
| |
16
|
G. Golub and A. Loan. Matrix computations. Johns Hopkins University Press, Baltimore, Maryland, 1996.
|
| |
17
|
V. Iyer and et. al. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83--87, 1999.
|
| |
18
|
I. Jolliffe. Principal component analysis. New York: Springer, 1986.
|
| |
19
|
M. Kendall and J. D. Gibbons. Rank Correlation Methods. New York: Oxford University Press, 1990.
|
| |
20
|
D. C. Lay. Linear Algebra and Its Applications. Addison Wesley, 2005.
|
| |
21
|
E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 2005.
|
| |
22
|
|
| |
23
|
B.-U. Pagel, F. Korn, and C. Faloutsos. De ating the dimensionality curse using multiple fractal dimensions. ICDE, 2000.
|
| |
24
|
S. Papadimitriou, H. Kitawaga, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. ICDE, 2003.
|
| |
25
|
S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley-Interscience, 1990.
|
| |
26
|
H. T. Reynolds. The analysis of cross-classifications. The Free Press, New York, 1977.
|
| |
27
|
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323--2326, 2000.
|
| |
28
|
M. Schroeder. Fractals, Chaos, Power Lawers: Minutes from an Infinite Paradise. W. H. Freeman, New York, 1991.
|
| |
29
|
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500):2319--2323, 2000.
|
 |
30
|
|
| |
31
|
L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML, 2003.
|
| |
32
|
X. Zhang, F. Pan, and W. Wang. Care: Finding local linear correlations in high dimensional data. ICDE, 2008.
|
| |
33
|
Z. Zhao and H. Liu. Searching for interacting features. IJCAI, 2007.
|
|