| Molecular feature mining in HIV data |
| Full text |
Pdf
(679 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
San Francisco, California
Pages: 136 - 143
Year of Publication: 2001
ISBN:1-58113-391-X
|
|
Authors
|
|
Stefan Kramer
|
Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany
|
|
Luc De Raedt
|
Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany
|
|
Christoph Helma
|
Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 64, Citation Count: 31
|
|
|
ABSTRACT
We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
 |
2
|
|
| |
3
|
L. Dehaspe, H. Toivonen, R.D. King. Finding frequent substructures in chemical compounds, in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 30-36, AAAI press, 1998.
|
| |
4
|
|
| |
5
|
|
| |
6
|
L. De Raedt, S. Kramer. The levelwise version space algorithm and its application to molecular fragment finding, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), 2001.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
C.A. James, D. Weininger, J. Delany. Daylight theory manual - Daylight J. 71, Daylight Chemical Information Systems, 2000. http ://www. daylight, corn/
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
T.M. Mitchell. Generalization as search, Artificial Intelligence, 18(2), 1982.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
D. Weininger. SMILES 1. Introduction and encoding rules. Journal of Chemical Information and Computer Sciences, 28, 31, 1988.
|
| |
22
|
D. Weininger, A. Weininger, J.L Weininger. SMILES II, algorithm for generation of unique SMILES notation. Journal of Chemical Information and Computer Sciences, 29, 97, 1989.
|
| |
23
|
Weislow, O.S., R. Kiser, D.L. Fine, J.P. Bader, R.H. Shoemaker, M.K. Boyd. New soluble formazan assay for HIV-1 cytopathic effects: application to high flux screening of synthetic and natural products for AIDS antiviral activity. Journal of the National Cancer Institute, 81:577-586, 1989.
|
CITED BY 31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francesco Bonchi , Fosca Giannotti , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Roberto Trasarti, A constraint-based querying system for exploratory pattern discovery, Information Systems, v.34 n.1, p.3-27, March, 2009
|
|
|
|
|
|
Wei Fan , Kun Zhang , Hong Cheng , Jing Gao , Xifeng Yan , Jiawei Han , Philip Yu , Olivier Verscheure, Direct mining of discriminative and essential frequent patterns via model-based search tree, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|