|
ABSTRACT
Attributed graphs are increasingly more common in many application domains such as chemistry, biology and text processing. A central issue in graph mining is how to collect informative subgraph patterns for a given learning task. We propose an iterative mining method based on partial least squares regression (PLS). To apply PLS to graph data, a sparse version of PLS is developed first and then it is combined with a weighted pattern mining algorithm. The mining algorithm is iteratively called with different weight vectors, creating one latent component per one mining call. Our method, graph PLS, is efficient and easy to implement, because the weight vector is updated with elementary matrix calculations. In experiments, our graph PLS algorithm showed competitive prediction accuracies in many chemical datasets and its efficiency was significantly superior to graph boosting (gBoost) and the naive method based on frequent graph mining.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. Bringmann, A. Zimmermann, L. D. Raedt, and S. Nijssen. Don't be afraid of simpler patterns. In 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pages 55--66. Sprinter, 2006.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
L. Eldén. Partial least squares vs. lanczos bidiagonalization i: Analysis of a projection method for multiple regression. Computational Statistics and Data Analysis, 46(1):11--31, 2004.
|
| |
6
|
H. Fröhrich, J. Wegner, F. Sieker, and Z. Zell. Kernel functions for attributed molecular graphs - a new similarity based approach to ADME prediction in classification and regression. QSAR & Combinatorial Science, 25(4):317--326, 2006.
|
| |
7
|
C. Helma, T. Cramer, S. Kramer, and L. Raedt. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. Sci., 44:1402--1411, 2004.
|
| |
8
|
A. Höskuldsson. PLS Regression Methods. Journal of Chemometrics, 2:211--228, 1988.
|
| |
9
|
|
| |
10
|
H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 21st International Conference on Machine Learning, pages 321--328. AAAI Press, 2003.
|
| |
11
|
J. Kazius, S. Nijssen, J. Kok, and T. B. A. Ijzerman. Substructure mining using elaborate chemical representation. J. Chem. Inf. Model., 46:597--605, 2006.
|
| |
12
|
|
 |
13
|
Nicole Krämer , Mikio L. Braun, Kernelizing PLS, degrees of freedom, and efficient model selection, Proceedings of the 24th international conference on Machine learning, p.441-448, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273552]
|
 |
14
|
|
| |
15
|
T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In Advances in Neural Information Processing Systems 17, pages 729--736. MIT Press, 2005.
|
| |
16
|
P. Mahé, L. Ralaivola, V. Stoven, and J.-P. Vert. The pharmacophore kernel for virtual screening with support vector machines. J. Chem. Inf. Model., 46(5):2003--2014, 2006.
|
| |
17
|
M. Momma and K. Bennett. Constructing orthogonal latent features for arbitrary loss. Feature Extraction, Foundations and Applications. Springer, 2006.
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
S. Nowozin, G. Bakir, and K. Tsuda. Discriminative subsequence mining for action classification. In Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV 2007), pages 1919--1923. IEEE Computer Society, 2007.
|
| |
22
|
S. Nowozin, K. Tsuda, T. Uno, T. Kudo, and G. Bakir. Weighted substructure mining for image analysis. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 1--8. IEEE Computer Society, 2007.
|
| |
23
|
|
| |
24
|
|
| |
25
|
R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection Techniques, pages 34--51. Springer, 2006.
|
| |
26
|
H. Saigo, T. Kadowaki, and K. Tsuda. A linear programming approach for molecular QSAR analysis. In International Workshop on Mining and Learning with Graphs (MLG), pages 85--96, 2006.
|
| |
27
|
|
| |
28
|
A. Sanfeliu and K. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern., 13:353--362, 1983.
|
 |
29
|
|
 |
30
|
|
| |
31
|
K. Tsuda and K. Kurihara. Graph mining with variational dirichlet process mixture models. In SIAM Conference on Data Mining (SDM), 2008. to appear.
|
 |
32
|
|
| |
33
|
H. Wold. Path models with latent variables: The NIPALS approach. In Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pages 307--357. Academic Press, 1975.
|
| |
34
|
S. Wold, M. Sjöstöm, and L. Erikkson. PLS-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems, 58:109--130, 2001.
|
 |
35
|
|
| |
36
|
|
|