| Anticipating annotations and emerging trends in biomedical literature |
| Full text |
Pdf
(284 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Industrial papers
table of contents
Pages 954-962
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Fabian Mörchen
|
Siemens Corporate Research, Princeton, NJ, USA
|
|
Mathäus Dejori
|
Siemens Corporate Research, Princeton, NJ, USA
|
|
Dmitriy Fradkin
|
Siemens Corporate Research, Princeton, NJ, USA
|
|
Julien Etienne
|
Siemens Corporate Research, Princeton, NJ, USA
|
|
Bernd Wachmann
|
Siemens Corporate Research, Princeton, NJ, USA
|
|
Markus Bundschus
|
Ludwig-Maximilians-University, Munich, Germany
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 0
|
|
|
ABSTRACT
The BioJournalMonitor is a decision support system for the analysis of trends and topics in the biomedical literature. Its main goal is to identify potential diagnostic and therapeutic biomarkers for specific diseases. Several data sources are continuously integrated to provide the user with up-to-date information on current research in this field. State-of-the-art text mining technologies are deployed to provide added value on top of the original content, including named entity detection, relation extraction, classification, clustering, ranking, summarization, and visualization. We present two novel technologies that are related to the analysis of temporal dynamics of text archives and associated ontologies. Currently, the MeSH ontology is used to annotate the scientific articles entering the PubMed database with medical terms. Both the maintenance of the ontology as well as the annotation of new articles is performed largely manually. We describe how probabilistic topic models can be used to annotate recent articles with the most likely MeSH terms. This provides our users with a competitive advantage because, when searching for MeSH terms, articles are found long before they are manually annotated. We further present a study on how to predict the inclusion of new terms in the MeSH ontology. The results suggest that early prediction of emerging trends is possible. The trend ranking functions are deployed in our system to enable interactive searches for the hottest new trends relating to a disease.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. M. Blei, K. Franks, M. I. Jordan, and I. S. Mian. Statistical modeling of biomedical corpora: mining the caenorhabditis genetic center bibliography for genes related to life span. BMC Bioinformatics, 7(1), 2006.
|
| |
3
|
D. M. Blei and M. I. Jordan. Modeling annotated data. pages 127--134, 2003.
|
| |
4
|
|
| |
5
|
M. Bundschus, M. Dejori, S. Yu, V. Tresp, and H.-P. Kriegel. Statistical modeling of medical indexing processes for biomedical knowledge information discovery from text. Submitted, 2008.
|
| |
6
|
|
| |
7
|
C. W. Gay, M. Kayaalp, and A. R. Aronson. Semi-automatic indexing of full text biomedical articles. In AMIA Annu Symp Proc, pages 271--275, 2005.
|
| |
8
|
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228--5235, 2004.
|
| |
9
|
Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature representation for clustering text streams. In Proc. SIAM Int. Conf. on Data Mining, 2007.
|
| |
10
|
T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, Stockholm, 1999.
|
| |
11
|
|
| |
12
|
S. M. Humphrey, T. C. Rindflesch, and A. R. Aronson. Automatic indexing by discipline and high-level categories: methodology and potential applications, 2000.
|
 |
13
|
|
 |
14
|
|
| |
15
|
B. Lent, R. Agrawal, and R. Srikant. Discovering trends in text databases. In Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, pages 227--230, 1997.
|
| |
16
|
|
| |
17
|
A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and role discovery in social networks. 2005.
|
| |
18
|
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification, 1998.
|
 |
19
|
|
| |
20
|
F. Mörchen, K. Brinker, and C. Neubauer. Any-time clustering of high frequency news streams. In Proc. Data Mining Case Studies Workshop, KDD, 2007.
|
 |
21
|
|
| |
22
|
A. Névéol, S. E. Shooshan, S. M. Humphrey, T. C. Rindflesch, and A. R. Aronson. Multiple approaches to fine-grained indexing of the biomedical literature. In Pacific Symp. on Biocomputing, pages 292--303. World Scientific, 2007.
|
| |
23
|
A. Névéol, S. E. Shooshan, J. G. Mork, and A. R. Aronson. Fine-grained indexing of the biomedical literature: Mesh subheading attachment for a medline indexing tool. In Proc. AMIA Symp, 2007.
|
 |
24
|
|
| |
25
|
|
| |
26
|
R. Schult and M. Spiliopoulou. Discovering emerging topics in unlabelled text collections. In Proc. East European ADBIS Conf., pages 353--366, 2006.
|
 |
27
|
Mark Steyvers , Padhraic Smyth , Michal Rosen-Zvi , Thomas Griffiths, Probabilistic author-topic models for information discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014087]
|
 |
28
|
|
 |
29
|
|
|