| Additional limitations of the clustering validation method figure of merit |
| Full text |
Pdf
(429 KB)
|
| Source
|
ACM Southeast Regional Conference
archive
Proceedings of the 45th annual southeast regional conference
table of contents
Winston-Salem, North Carolina
Pages: 238 - 243
Year of Publication: 2007
ISBN:978-1-59593-629-5
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 52, Citation Count: 0
|
|
|
ABSTRACT
Clustering analysis is an important exploratory tool that aids in the analysis and organization of genomic data. Each biological data set has different characteris, and the decision of which clustering method is appropriate and how many clusters are optimal on a dataset-by-dataset basis can be problematic. The Figure of Merit (FOM) is a quantitative clustering validation method designed to aid in these decisions. While FOM is useful, it does have limitations which must be considered when using it. This research shows that the FOM is biased toward Euclidean distance. Performing FOM analysis on clusters created by using Pearson's correlation coefficient as a similarity measure is shown to be non-optimal, and mathematically inadvisable. A new, correlation coefficient-biased version of the FOM has been developed, and preliminary results indicate that this new FOM is effectively biased toward clusters generated using the correlation coefficient.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Copson, E. T. Metric Spaces. Cambridge University Press, London, 1968.
|
| |
2
|
Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sei, 95, (1998), 14863--14868.
|
| |
3
|
Electronic Source, Affymetrix, Inc. GeneChip Expression Analysis: Data Analysis Fundamentals, https://www.affymetrix.com/support/downloads/manuals/dat a_analysis_fundamentals_manual.pdf, Nov. 27, 2006.
|
| |
4
|
|
| |
5
|
Olver, P. J. and Shakiban, C. Applied Linear Algebra. Pearson Prentice Hall, Upper Saddle River, NJ, 2006.
|
| |
6
|
Pitts, C. G. C. Introduction to Metric Spaces. Oliver and Boyd, Edinburgh, 1972.
|
| |
7
|
|
| |
8
|
Shamir, R. and Sharaa, R. Algorithmic approaches to clustering gene expression data. In Current Topic in Computational Biology. MIT Press, 2001.
|
| |
9
|
Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. Validating Clustering for Gene Expression Data. Technical Report UW-CSE-00-01-01, University of Washington, Seattle, WA, 2000.
|
| |
10
|
Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. Validating clustering for gene expression data. Bioinformatics, 17 (4), (2001), 309--318.
|
| |
11
|
|
| |
12
|
Yeung, K. Y., Medvedovic, M. and Bumgarner, R. E. Clustering gene-expression data with repeated measurements. Genome Biology, 4 (5) R34, Epub 2003 Apr 25, http://expression.washington.edu/publications/kayee/yeunggb2003/.
|
INDEX TERMS
Primary Classification:
J.
Computer Applications
J.3
LIFE AND MEDICAL SCIENCES
Subjects:
Biology and genetics
General Terms:
Algorithms,
Measurement
Keywords:
Euclidean distance,
FOM,
K-means,
Pearson's correlation coefficient,
cluster analysis,
cluster validation method,
clustering,
correlation coefficient,
distance metric,
figure of merit,
gene expression,
similarity measure
|