ACM Home Page
Please provide us with feedback. Feedback
Large-scale graph mining using backbone refinement classes
Full text MovMov (14:58),  PdfPdf (571 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 617-626  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Andreas Maunz  Albert-Ludwigs-Universität, Freiburg, Germany
Christoph Helma  in-silico Toxicology, Basel, Switzerland
Stefan Kramer  Technische Universität, München, Germany
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 130,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557089
What is a DOI?

ABSTRACT

We present a new approach to large-scale graph mining based on so-called backbone refinement classes. The method efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, using classes of fragments to reduce feature set size and running times. The classes are defined in terms of fragments sharing a common backbone. The method is able to optimize structural inter-feature entropy as opposed to occurrences, which is characteristic for open or closed fragment mining. In the experiments, the proposed method reduces feature set sizes by >90 % and >30 % compared to complete tree mining and open tree mining, respectively. Evaluation using crossvalidation runs shows that their classification accuracy is similar to the complete set of trees but significantly better than that of open trees. Compared to open or closed fragment mining, a large part of the search space can be pruned due to an improved statistical constraint (dynamic upper bound adjustment), which is also confirmed in the experiments in lower running times compared to ordinary (static) upper bound pruning. Further analysis using large-scale datasets yields insight into important properties of the proposed descriptors, such as the dataset coverage and the class size represented by each descriptor. A final cross-validation run confirms that the novel descriptors render large training sets feasible which previously might have been intractable.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
B. Bringmann, A. Zimmermann, L. de Raedt, and S. Nijssen. Don't Be Afraid of Simpler Patterns. In Proceedings 10th PKDD, pages 55--66. Springer-Verlag, 2006.
 
3
C. Helma. Lazy Structure-Activity Relationships (lazar) for the Prediction of Rodent Carcinogenicity and Salmonella Mutagenicity. Molecular Diversity, pages 147--158, 2006.
4
 
5
K. Jahn and S. Kramer. Optimizing gSpan for Molecular Datasets. In Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
6
7
8
 
9
S. Nijssen and J. N. Kok. Frequent Subgraph Miners: Runtimes Don't Say Everything. In Proceedings of the International Workshop on Mining and Learning with Graphs (MLG 2006, pages 173--180, 2006.
 
10
 
11
F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80--83, 1945.
 
12
M. Worlein, T. Meinl, I. Fischer, and M. Philippsen. A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. In Proceedings of PKDD, pages 392--403, 2005.
 
13
14

Collaborative Colleagues:
Andreas Maunz: colleagues
Christoph Helma: colleagues
Stefan Kramer: colleagues