ACM Home Page
Please provide us with feedback. Feedback
A new efficient probabilistic model for mining labeled ordered trees
Full text PdfPdf (946 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
SESSION: Research track papers table of contents
Pages: 177 - 186  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Kosuke Hashimoto  Kyoto University, Gokasho Uji, Japan
Kiyoko F. Aoki-Kinoshita  Kyoto University, Gokasho Uji, Japan
Nobuhisa Ueda  Kyoto University, Gokasho Uji, Japan
Minoru Kanehisa  Kyoto University, Gokasho Uji, Japan
Hiroshi Mamitsuka  Kyoto University, Gokasho Uji, Japan
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 93,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150425
What is a DOI?

ABSTRACT

Mining frequent patterns is a general and important issue in data mining. Complex and unstructured (or semi-structured) datasets have appeared in major data mining applications, including text mining, web mining and bioinformatics. Mining patterns from these datasets is the focus of many of the current data mining approaches. We focus on labeled ordered trees, typical datasets of semi-structured data in data mining, and propose a new probabilistic model and its efficient learning scheme for mining labeled ordered trees. The proposed approach significantly improves the time and space complexity of an existing probabilistic modeling for labeled ordered trees, while maintaining its expressive power. We evaluated the performance of the proposed model, comparing it with that of the existing model, using synthetic as well as real datasets from the field of glycobiology. Experimental results showed that the proposed model drastically reduced the computation time of the competing model, keeping the predictive power and avoiding overfitting to the training data. Finally, we assessed our results using the proposed model on real data from a variety of biological viewpoints, verifying known facts in glycobiology.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
E. Baum and T. Petrie. Statistical inference for probabilistic functions of infinite state Markov chains. Ann. Math. Stat., 37:1554--1563, 1966.
 
4
S. Chakrabarti. Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufmann, 2002.
 
5
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc., B, 39:1--38, 1977.
 
6
 
7
 
8
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29--36, 1982.
 
9
K. Hashimoto, S. Goto, S. Kawano, K. Aoki-Kinoshita, N. Ueda, M. Hamajima, T. Kawasaki, and M. Kanehisa. KEGG as a glycome informatics resource. Glycobiology, 16(5):63R--70R, 2005.
 
10
M. Kanehisa, S. Goto, M. Hattori, K. F. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki, and M. Hirakawa. From genomics to chemical genomics: New developments in KEGG. Nucl. Acids Res., 34:D354--D357, 2006.
 
11
 
12
K. Lari and S. J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35--56, 1990.
 
13
S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. Royal Stat. Soc. B, 50(2):157--224, 1988.
 
14
 
15
G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley-Interscience, 1996.
 
16
 
17
L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1):4--16, 1986.
 
18
N. Ueda, K. F. Aoki, and H. Mamitsuka. A general probabilistic framework for mining labeled ordered trees. In Proc. Fourth SDM, pages 357--368, 2004.
 
19
 
20
A. Varki, R. Cummings, J. Esko, H. Freeze, G. Hart, and J. Marth, editors. Essentials of Glycobiology. Cold Spring Harbor Laboratory Press, New York, 1999.
 
21
22
 
23


Collaborative Colleagues:
Kosuke Hashimoto: colleagues
Kiyoko F. Aoki-Kinoshita: colleagues
Nobuhisa Ueda: colleagues
Minoru Kanehisa: colleagues
Hiroshi Mamitsuka: colleagues