ACM Home Page
Please provide us with feedback. Feedback
Multi-view clustering via canonical correlation analysis
Full text PdfPdf (703 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 129-136  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Kamalika Chaudhuri  ITA, UC San Diego, La Jolla, CA
Sham M. Kakade  Toyota Technological Institute at Chicago, Chicago, IL
Karen Livescu  Toyota Technological Institute at Chicago, Chicago, IL
Karthik Sridharan  Toyota Technological Institute at Chicago, Chicago, IL
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 47,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553391
What is a DOI?

ABSTRACT

Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA).

Under the assumption that the views are un-correlated given the cluster label, we show that the separation conditions required for the algorithm to be successful are significantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Achlioptas, D., & McSherry, F. (2005). On spectral learning of mixtures of distributions. Conf. on Learning Thy (pp. 458--469).
2
 
3
Arora, S., & Kannan, R. (2005). Learning mixtures of separated nonspherical Gaussians. Ann. Applied Prob., 15, 69--92.
 
4
Blaschko, M. B., & Lampert, C. H. (2008). Correlational spectral clustering. Conf. on Comp. Vision and Pattern Recognition.
5
 
6
 
7
Chaudhuri, K., & Rao, S. (2008). Learning mixtures of distributions using correlations and independence. Conf. On Learning Thy. (pp. 9--20).
 
8
 
9
 
10
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Proc., 28, 357--366.
 
11
Dunn, G., & Everitt, B. (2004). An introduction to math. taxonomy. Dover Books.
 
12
Kakade, S. M., & Foster, D. P. (2007). Multi-view regression via canonical correlation analysis. Conf. Learning Thy (pp. 82--96).
 
13
Kannan, R., Salmasian, H., & Vempala, S. (2005). The spectral method for general mixture models. Conf. on Learning Thy (pp. 444--457).
14
 
15
Sanderson, C. (2008). Biometric person recognition: Face, speech and fusion. VDM-Verlag.
 
16

Collaborative Colleagues:
Kamalika Chaudhuri: colleagues
Sham M. Kakade: colleagues
Karen Livescu: colleagues
Karthik Sridharan: colleagues