ACM Home Page
Please provide us with feedback. Feedback
Additional limitations of the clustering validation method figure of merit
Full text PdfPdf (429 KB)
Source ACM Southeast Regional Conference archive
Proceedings of the 45th annual southeast regional conference table of contents
Winston-Salem, North Carolina
SESSION: Papers table of contents
Pages: 238 - 243  
Year of Publication: 2007
ISBN:978-1-59593-629-5
Authors
Amy L. Olex  Wake Forest University, Winston-Salem, NC
David J. John  Wake Forest University, Winston-Salem, NC
Elizabeth M. Hiltbold  Wake Forest University, Winston-Salem, NC
Jacquelyn S. Fetrow  Wake Forest University, Winston-Salem, NC
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 52,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1233341.1233384
What is a DOI?

ABSTRACT

Clustering analysis is an important exploratory tool that aids in the analysis and organization of genomic data. Each biological data set has different characteris, and the decision of which clustering method is appropriate and how many clusters are optimal on a dataset-by-dataset basis can be problematic. The Figure of Merit (FOM) is a quantitative clustering validation method designed to aid in these decisions. While FOM is useful, it does have limitations which must be considered when using it. This research shows that the FOM is biased toward Euclidean distance. Performing FOM analysis on clusters created by using Pearson's correlation coefficient as a similarity measure is shown to be non-optimal, and mathematically inadvisable. A new, correlation coefficient-biased version of the FOM has been developed, and preliminary results indicate that this new FOM is effectively biased toward clusters generated using the correlation coefficient.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Copson, E. T. Metric Spaces. Cambridge University Press, London, 1968.
 
2
Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sei, 95, (1998), 14863--14868.
 
3
Electronic Source, Affymetrix, Inc. GeneChip Expression Analysis: Data Analysis Fundamentals, https://www.affymetrix.com/support/downloads/manuals/dat a_analysis_fundamentals_manual.pdf, Nov. 27, 2006.
 
4
 
5
Olver, P. J. and Shakiban, C. Applied Linear Algebra. Pearson Prentice Hall, Upper Saddle River, NJ, 2006.
 
6
Pitts, C. G. C. Introduction to Metric Spaces. Oliver and Boyd, Edinburgh, 1972.
 
7
 
8
Shamir, R. and Sharaa, R. Algorithmic approaches to clustering gene expression data. In Current Topic in Computational Biology. MIT Press, 2001.
 
9
Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. Validating Clustering for Gene Expression Data. Technical Report UW-CSE-00-01-01, University of Washington, Seattle, WA, 2000.
 
10
Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. Validating clustering for gene expression data. Bioinformatics, 17 (4), (2001), 309--318.
 
11
 
12
Yeung, K. Y., Medvedovic, M. and Bumgarner, R. E. Clustering gene-expression data with repeated measurements. Genome Biology, 4 (5) R34, Epub 2003 Apr 25, http://expression.washington.edu/publications/kayee/yeunggb2003/.

Collaborative Colleagues:
Amy L. Olex: colleagues
David J. John: colleagues
Elizabeth M. Hiltbold: colleagues
Jacquelyn S. Fetrow: colleagues