ACM Home Page
Please provide us with feedback. Feedback
REDUS: finding reducible subspaces in high dimensional data
Full text PdfPdf (914 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: KM: feature selection table of contents
Pages 961-970  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Xiang Zhang  University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Feng Pan  University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Wei Wang  University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 101,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458209
What is a DOI?

ABSTRACT

Finding latent patterns in high dimensional data is an important research problem with numerous applications. The most well known approaches for high dimensional data analysis are feature selection and dimensionality reduction. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging applications, however, scientists are interested in the local latent patterns held by feature subspaces, which may be invisible via any global transformation.

In this paper, we investigate the problem of finding strong linear and nonlinear correlations hidden in feature subspaces of high dimensional data. We formalize this problem as identifying reducible subspaces in the full dimensional space. Intuitively, a reducible subspace is a feature subspace whose intrinsic dimensionality is smaller than the number of features. We present an effective algorithm, REDUS, for finding the reducible subspaces. Two key components of our algorithm are finding the overall reducible subspace, and uncovering the individual reducible subspaces from the overall reducible subspace. A broad experimental evaluation demonstrates the effectiveness of our algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A. Alizadeh and et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503--11, 2000.
3
 
4
5
 
6
7
 
8
I. Borg and P. Groenen. Modern multidimensional scaling. New York: Springer, 1997.
 
9
 
10
 
11
M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95:14863--68, 1998.
12
 
13
K. Fukunaga. Intrinsic dimensionality extraction. Classification, Pattern recongnition and Reduction of Dimensionality, Volume 2 of Handbook of Statistics, pages 347--360, P. R. Krishnaiah and L. N. Kanal eds., Amsterdam, North Holland, 1982.
 
14
15
 
16
G. Golub and A. Loan. Matrix computations. Johns Hopkins University Press, Baltimore, Maryland, 1996.
 
17
V. Iyer and et. al. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83--87, 1999.
 
18
I. Jolliffe. Principal component analysis. New York: Springer, 1986.
 
19
M. Kendall and J. D. Gibbons. Rank Correlation Methods. New York: Oxford University Press, 1990.
 
20
D. C. Lay. Linear Algebra and Its Applications. Addison Wesley, 2005.
 
21
E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 2005.
 
22
 
23
B.-U. Pagel, F. Korn, and C. Faloutsos. De ating the dimensionality curse using multiple fractal dimensions. ICDE, 2000.
 
24
S. Papadimitriou, H. Kitawaga, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. ICDE, 2003.
 
25
S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley-Interscience, 1990.
 
26
H. T. Reynolds. The analysis of cross-classifications. The Free Press, New York, 1977.
 
27
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323--2326, 2000.
 
28
M. Schroeder. Fractals, Chaos, Power Lawers: Minutes from an Infinite Paradise. W. H. Freeman, New York, 1991.
 
29
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500):2319--2323, 2000.
30
 
31
L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML, 2003.
 
32
X. Zhang, F. Pan, and W. Wang. Care: Finding local linear correlations in high dimensional data. ICDE, 2008.
 
33
Z. Zhao and H. Liu. Searching for interacting features. IJCAI, 2007.

Collaborative Colleagues:
Xiang Zhang: colleagues
Feng Pan: colleagues
Wei Wang: colleagues