ACM Home Page
Please provide us with feedback. Feedback
On mining cross-graph quasi-cliques
Full text PdfPdf (574 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 228 - 238  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
Jian Pei  Simon Fraser University, Canada
Daxin Jiang  State University of New York at Buffalo
Aidong Zhang  State University of New York at Buffalo
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 109,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081898
What is a DOI?

ABSTRACT

Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in cross-market customer segmentation, a group of customers who behave similarly in multiple markets should be considered as a more coherent and more reliable cluster than clusters found in a single market. As another example, in bioinformatics, by joint mining of gene expression data and protein interaction data, we can find clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways.In this paper, we investigate a novel data mining problem, mining cross-graph quasi-cliques, which is generalized from several interesting applications such as cross-market customer segmentation and joint mining of gene expression data and protein interaction data. We build a general model for mining cross-graph quasi-cliques, show why the complete set of cross-graph quasi-cliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop an efficient algorithm, Crochet, which exploits several interesting and effective techniques and heuristics to efficaciously mine cross-graph quasi-cliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful cross-graph quasi-cliques in bioinformatics. The experimental results also show that algorithm Crochet is efficient and scalable.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bader, G.D. and Hogue, C.W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 4(1)(2), Jan. 2003.
 
3
Barabási, A.L. and Albert, R. Emergence of scaling in random networks. Science, 286, Oct. 1999.
 
4
Bayada, D.M. et al. An algorithm for the nultiple common subgraph problem. J. of Chem Info & Comp Sci, pages 680--685, 1992.
 
5
Ben-Dor, A. et al. Clustering gene expression patterns. J. of Comp Biology, 6(3/4):281--297, 1999.
 
6
Bu, D. et al. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research, 31(9):2443--2450, 2003.
 
7
 
8
Cho, R.J. et al. A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle. Molecular Cell, Vol. 2(1):65--73, July 1998.
 
9
 
10
Dongen, S.V. Graph clustering by flow simulation. PhD Thesis, University of Utrecht, The Netherlands, 2000.
11
 
12
Enright, A.J., Dongen, S.V. and Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Reserach, 30(7):1575--1584, 2002.
13
 
14
 
15
 
16
Holder, L.B. et al. Substructure discovery in the subdue system. In KDD'94.
 
17
18
 
19
Kasturi, J. et al. An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics, pages 449--458, 2003.
 
20
 
21
 
22
Ng, A.Y. et al. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 14, 2001.
23
 
24
Rymon, R. Search through systematic set enumeration. In KR'92.
 
25
Segal, E. et al. Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19:i264--i272, 2003.
 
26
 
27
Takahashi, Y. et al. Recognition of largest common fragment among a variety of chemical structures. Analytical Sciences, pages 23--28, 1987.
28
29
 
30
Xu, Y. et al. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics, 18:536--545, 2002.
31
 
32
33

CITED BY  13

Collaborative Colleagues:
Jian Pei: colleagues
Daxin Jiang: colleagues
Aidong Zhang: colleagues