| A scalable framework for discovering coherent co-clusters in noisy data |
| Full text |
Pdf
(639 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 241-248
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
Meghana Deodhar
|
University of Texas at Austin, Austin, TX
|
|
Gunjan Gupta
|
University of Texas at Austin, Austin, TX
|
|
Joydeep Ghosh
|
University of Texas at Austin, Austin, TX
|
|
Hyuk Cho
|
University of Texas at Austin, Austin, TX
|
|
Inderjit Dhillon
|
University of Texas at Austin, Austin, TX
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 47, Citation Count: 1
|
|
|
ABSTRACT
Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional "one-sided" clustering. We propose Robust Overlapping Co-Clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
Amir Ben-Dor , Benny Chor , Richard Karp , Zohar Yakhini, Discovering local structure in gene expression data: the order-preserving submatrix problem, Proceedings of the sixth annual international conference on Computational biology, p.49-57, April 18-21, 2002, Washington, DC, USA
[doi> 10.1145/565196.565203]
|
| |
4
|
Bergmann, S., Ihmels, J., & Barkai, N. (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys., 67.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Deodhar, M., Cho, H., Gupta, G., Ghosh, J., & Dhillon, I. (2008). Robust overlapping co-clustering. Dept. of ECE, Univ. of Texas at Austin, IDEAL-TR09, Downloadable from http://www.lans.ece.utexas.edu/papers/techreports/deodhar08ROCC.pdf.
|
| |
8
|
|
 |
9
|
|
| |
10
|
Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. Know. Disc. and Data Mining '96.
|
| |
11
|
Gasch, A., Spellman, P., Kao, C., Carmel-Harel, et al. (2000). Genomic expression program in the response of yeast cells to environmental changes. Molecular Cell Biology, 11, 4241--4257.
|
| |
12
|
Gordon, G. J., Jensen, R. V., Hsiao, L., Gullans, S. R., et al. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62, 4963--4967.
|
| |
13
|
|
| |
14
|
Lazzeroni, L., & Owen, A. B. (2002). Plaid models for gene expression data. Statistica Sinica, 12, 61--86.
|
| |
15
|
|
| |
16
|
Lee, I., Date, S., Adai, A., & Marcotte, E. (2004). A probabilistic functional network of yeast genes. Science, 306, 1555--1558.
|
| |
17
|
|
| |
18
|
Murali, T., & Kasif, S. (2003). Extracting conserved gene expression motifs from gene expression data. Pacific Symposium on Biocomp., 8, 77--88.
|
 |
19
|
|
| |
20
|
Amela Prelić , Stefan Bleuler , Philip Zimmermann , Anja Wille , Peter Bühlmann , Wilhelm Gruissem , Lars Hennig , Lothar Thiele , Eckart Zitzler, A systematic comparison and evaluation of biclustering methods for gene expression data, Bioinformatics, v.22 n.9, p.1122-1129, May 2006
[doi> 10.1093/bioinformatics/btl060]
|
| |
21
|
Tanay, A., Sharan, R., & Shamir, R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18, 136--144.
|
 |
22
|
|
| |
23
|
Ward, J. (1963). Hierarchical grouping to optimize an objective function. Jl. of American Stat. Assoc., 58, 236--244.
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
|