ACM Home Page
Please provide us with feedback. Feedback
On the potential of domain literature for clustering and Bayesian network learning
Full text PdfPdf (1.10 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Industry track papers table of contents
Pages: 405 - 414  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Peter Antal  Katholieke Universiteit Leuven, El. Eng. ESAT-SCD (SISTA), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
Patrick Glenisson  Katholieke Universiteit Leuven, El. Eng. ESAT-SCD (SISTA), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
Geert Fannes  Katholieke Universiteit Leuven, El. Eng. ESAT-SCD (SISTA), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 38,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775105
What is a DOI?

ABSTRACT

Thanks to its increasing availability, electronic literature can now be a major source of information when developing complex statistical models where data is scarce or contains much noise. This raises the question of how to integrate information from domain literature with statistical data. Because quantifying similarities or dependencies between variables is a basic building block in knowledge discovery, we consider here the following question. Which vector representations of text and which statistical scores of similarity or dependency support best the use of literature in statistical models? For the text source, we assume to have annotations for the domain variables as short free-text descriptions and optionally to have a large literature repository from which we can further expand the annotations. For evaluation, we contrast the variables similarities or dependencies obtained from text using different annotation sources and vector representations with those obtained from measurement data or expert assessments. Specifically, we consider two learning problems: clustering and Bayesian network learning. Firstly, we report performance (against an expert reference) for clustering yeast genes from textual annotations. Secondly, we assess the agreement between text-based and data-based scores of variable dependencies when learning Bayesian network substructures for the task of modeling the joint distribution of clinical measurements of ovarian tumors.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
C. Blaschke, J. Oliveros, and A. Valencia. Mining functional information associated with expression arrays. Funct Integr Genomics, 1:256--268, 2001.
 
4
 
5
D. M. et al. Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics, 17:319--326, 2001.
 
6
 
7
 
8
 
9
T. Jenssen, A. Laegreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28:21--28, may 2001.
 
10
L. Kaufman and P. Rousseeuw. Finding groups in data. Wiley-Interscience, 1990.
 
11
 
12
D. Masys. Linking microarray data to the literature. Nature Genetics, 28:9--10, 2001.
 
13
G. Milligan and M. Cooper. A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavorial Research, 21:441--458, 1986.
14
 
15
 
16
J. Quakenbush. Computational analysis of microarray data. Nature Reviews Genetics, 2:418--427, 2001.
 
17
 
18
 
19
D. Timmerman. Ultrasonography in the assessment of ovarian and tamoxifen-associated endometrial pathology. Ph.D. dissertation, Leuven University Press, D/1997/1869/70, 1997.
 
20
D. Timmerman, L. Valentin, T. H. Bourne, W. P. Collins, H. Verrelst, and I. Vergote. Terms, definitions and measurements to describe the sonographic features of adnexal tumors: a consensus opinion from the international ovarian tumor analysis (iota) group. Ultrasound Obstet Gynecol, 16(5):500--505, Oct 2000.

Collaborative Colleagues:
Peter Antal: colleagues
Patrick Glenisson: colleagues
Geert Fannes: colleagues