ACM Home Page
Please provide us with feedback. Feedback
A scalable framework for discovering coherent co-clusters in noisy data
Full text PdfPdf (639 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 241-248  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Meghana Deodhar  University of Texas at Austin, Austin, TX
Gunjan Gupta  University of Texas at Austin, Austin, TX
Joydeep Ghosh  University of Texas at Austin, Austin, TX
Hyuk Cho  University of Texas at Austin, Austin, TX
Inderjit Dhillon  University of Texas at Austin, Austin, TX
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 47,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553405
What is a DOI?

ABSTRACT

Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional "one-sided" clustering. We propose Robust Overlapping Co-Clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
Bergmann, S., Ihmels, J., & Barkai, N. (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys., 67.
 
5
 
6
 
7
Deodhar, M., Cho, H., Gupta, G., Ghosh, J., & Dhillon, I. (2008). Robust overlapping co-clustering. Dept. of ECE, Univ. of Texas at Austin, IDEAL-TR09, Downloadable from http://www.lans.ece.utexas.edu/papers/techreports/deodhar08ROCC.pdf.
 
8
9
 
10
Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. Know. Disc. and Data Mining '96.
 
11
Gasch, A., Spellman, P., Kao, C., Carmel-Harel, et al. (2000). Genomic expression program in the response of yeast cells to environmental changes. Molecular Cell Biology, 11, 4241--4257.
 
12
Gordon, G. J., Jensen, R. V., Hsiao, L., Gullans, S. R., et al. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62, 4963--4967.
 
13
 
14
Lazzeroni, L., & Owen, A. B. (2002). Plaid models for gene expression data. Statistica Sinica, 12, 61--86.
 
15
 
16
Lee, I., Date, S., Adai, A., & Marcotte, E. (2004). A probabilistic functional network of yeast genes. Science, 306, 1555--1558.
 
17
 
18
Murali, T., & Kasif, S. (2003). Extracting conserved gene expression motifs from gene expression data. Pacific Symposium on Biocomp., 8, 77--88.
19
 
20
 
21
Tanay, A., Sharan, R., & Shamir, R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18, 136--144.
22
 
23
Ward, J. (1963). Hierarchical grouping to optimize an objective function. Jl. of American Stat. Assoc., 58, 236--244.
 
24
 
25
 
26


Collaborative Colleagues:
Meghana Deodhar: colleagues
Gunjan Gupta: colleagues
Joydeep Ghosh: colleagues
Hyuk Cho: colleagues
Inderjit Dhillon: colleagues