| A new efficient probabilistic model for mining labeled ordered trees |
| Full text |
Pdf
(946 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Philadelphia, PA, USA
SESSION: Research track papers
table of contents
Pages: 177 - 186
Year of Publication: 2006
ISBN:1-59593-339-5
|
|
Authors
|
|
Kosuke Hashimoto
|
Kyoto University, Gokasho Uji, Japan
|
|
Kiyoko F. Aoki-Kinoshita
|
Kyoto University, Gokasho Uji, Japan
|
|
Nobuhisa Ueda
|
Kyoto University, Gokasho Uji, Japan
|
|
Minoru Kanehisa
|
Kyoto University, Gokasho Uji, Japan
|
|
Hiroshi Mamitsuka
|
Kyoto University, Gokasho Uji, Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 93, Citation Count: 2
|
|
|
ABSTRACT
Mining frequent patterns is a general and important issue in data mining. Complex and unstructured (or semi-structured) datasets have appeared in major data mining applications, including text mining, web mining and bioinformatics. Mining patterns from these datasets is the focus of many of the current data mining approaches. We focus on labeled ordered trees, typical datasets of semi-structured data in data mining, and propose a new probabilistic model and its efficient learning scheme for mining labeled ordered trees. The proposed approach significantly improves the time and space complexity of an existing probabilistic modeling for labeled ordered trees, while maintaining its expressive power. We evaluated the performance of the proposed model, comparing it with that of the existing model, using synthetic as well as real datasets from the field of glycobiology. Experimental results showed that the proposed model drastically reduced the computation time of the competing model, keeping the predictive power and avoiding overfitting to the training data. Finally, we assessed our results using the proposed model on real data from a variety of biological viewpoints, verifying known facts in glycobiology.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
E. Baum and T. Petrie. Statistical inference for probabilistic functions of infinite state Markov chains. Ann. Math. Stat., 37:1554--1563, 1966.
|
| |
4
|
S. Chakrabarti. Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufmann, 2002.
|
| |
5
|
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc., B, 39:1--38, 1977.
|
| |
6
|
|
| |
7
|
|
| |
8
|
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29--36, 1982.
|
| |
9
|
K. Hashimoto, S. Goto, S. Kawano, K. Aoki-Kinoshita, N. Ueda, M. Hamajima, T. Kawasaki, and M. Kanehisa. KEGG as a glycome informatics resource. Glycobiology, 16(5):63R--70R, 2005.
|
| |
10
|
M. Kanehisa, S. Goto, M. Hattori, K. F. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki, and M. Hirakawa. From genomics to chemical genomics: New developments in KEGG. Nucl. Acids Res., 34:D354--D357, 2006.
|
| |
11
|
|
| |
12
|
K. Lari and S. J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35--56, 1990.
|
| |
13
|
S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. Royal Stat. Soc. B, 50(2):157--224, 1988.
|
| |
14
|
|
| |
15
|
G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley-Interscience, 1996.
|
| |
16
|
|
| |
17
|
L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1):4--16, 1986.
|
| |
18
|
N. Ueda, K. F. Aoki, and H. Mamitsuka. A general probabilistic framework for mining labeled ordered trees. In Proc. Fourth SDM, pages 357--368, 2004.
|
| |
19
|
|
| |
20
|
A. Varki, R. Cummings, J. Esko, H. Freeze, G. Hart, and J. Marth, editors. Essentials of Glycobiology. Cold Spring Harbor Laboratory Press, New York, 1999.
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
|