|
ABSTRACT
Existing axis scaling and dimensionality methods focus on preserving structure, usually determined via the Euclidean distance. In other words, they inherently assume that the Euclidean distance is already correct. We instead propose a novel nonlinear approach driven by an information-theoretic viewpoint, which we show is also strongly linked to intrinsic dimensionality, or degrees of freedom; and uniformity. Nonlinear transformations based on common probability distributions, combined with information-driven selection, simultaneously reduce the number of dimensions required and increase the value of those we retain. Experiments on real data confirm that this approach reveals correlations, finds novel attributes, and scales well.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Shivnath Babu , Minos Garofalakis , Rajeev Rastogi, SPARTAN: a model-based semantic compression system for massive data tables, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.283-294, May 21-24, 2001, Santa Barbara, California, United States
|
| |
2
|
|
| |
3
|
|
| |
4
|
C. Blake and C. Merz. UCI repository of machine learning databases, 1998.
|
| |
5
|
L. Breiman, J. H. Freidman, R. A. Olshen, and C. J. Stone. CART: Classification and Regression Trees. Chapman & Hall / CRC Press, 1984.
|
| |
6
|
Central Intelligence Agency, editor. The World Factbook. U.S. Government Printing Office, 1992. http://www.cia.gov/cia/publications/factbook/.
|
| |
7
|
Central Intelligence Agency, editor. The World Factbook. U.S. Government Printing Office, 2001. http://www.cia.gov/cia/publications/factbook/.
|
| |
8
|
|
| |
9
|
K. Chang and J. Ghosh. Principal curves for nonlinear feature extraction and classification. SPIE Applications of Artificial Neural Networks in Image Processing III, 3307:120--129, 1998.
|
| |
10
|
|
 |
11
|
|
| |
12
|
A. Hyvärinen. Survey on independent component analysis. Neural Computing Surveys, 2:94--128, 1999.
|
| |
13
|
A. Hyvärinen, J. Karunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, 2001.
|
| |
14
|
N. Johnson and S. Kotz. Continuous univariate distributions. Houghton Mifflin, 1970.
|
| |
15
|
I. T. Jolliffe. Principal Components Analysis. Springer-Verlag, New York, 1986.
|
 |
16
|
|
| |
17
|
T. Kohonen. The self-organizing map. In Proceedings of the IEEE, volume 78, 1990.
|
| |
18
|
S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290, December 2000.
|
| |
19
|
|
| |
20
|
G. Schuster. Deterministic Chaos an Introduction. Verlagsgesellschaft, Weinheim, Germany, 3rd edition, 1995.
|
| |
21
|
C. Shannon. A mathematical theory of communcation. Bell Systems Technical Journal, 1948.
|
| |
22
|
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. 290:2319--2322, December 2000.
|
| |
23
|
C. Traina Jr, A. Traina, L. Wu, and C. Faloutsos. Fast feature selection using fractal dimension. Simpósio Brasileiro de Banco de Dados, Oct. 2000.
|
|