| Classifying biological articles using web resources |
| Full text |
Pdf
(144 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2004 ACM symposium on Applied computing
table of contents
Nicosia, Cyprus
SESSION: Bioinformatics (BIO)
table of contents
Pages: 111 - 115
Year of Publication: 2004
ISBN:1-58113-812-1
|
|
Authors
|
|
Francisco M. Couto
|
Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa, Portugal
|
|
Bruno Martins
|
Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa, Portugal
|
|
Mário J. Silva
|
Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa, Portugal
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 19, Citation Count: 1
|
|
|
ABSTRACT
Text classification systems on biomedical literature aim to select relevant articles to a specific issue from large corpora. Most systems with an acceptable accuracy are based on domain knowledge, which is very expensive and does not provide a general solution. This paper presents a novel approach for text classification on biomedical literature, involving the use of information extracted from related web resources. We validated this approach by implementing the proposed method and testing it on the KDD2002 Cup challenge: bio-text task. Results show that our approach can effectively improve efficiency on text classification systems for biomedical literature.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Benson, I. Karsch-Mizrachi, D. Lipman, J. Ostell, B. Rapp, and D. Wheeler. GenBank. Nucleic Acids Research, 30:17--20, 2002.
|
| |
3
|
C. Blaschke, R. Hoffmann, J. Oliveros, and A. Valencia. Extracting information automatically from biological literature. Comparative and Functional Genomics, 2:310--313, 2001.
|
| |
4
|
A. P. Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145--1159, 1997.
|
| |
5
|
|
| |
6
|
F. Couto, M. Silva, and P. Coutinho. Improving information extraction through biological correlation. In Data Mining and Text Mining for Bioinformatics Workshop co-located with 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Dubrovnik-Cavtat, Croatia, September 2003.
|
| |
7
|
F. Couto, M. Silva, and P. Coutinho. ProFAL: Protein functional annotation through literature. In VIII Conference on Software Engineering and Databases (JISBD), Alicante, Spain, November 2003.
|
| |
8
|
M. Gerstein. Integrative database analysis in structural genomics. Nature Structural Biology, Structural genomics supplement: 960--963, November 2000.
|
 |
9
|
|
| |
10
|
L. Hirschman, J. Park, J. Tsujii, L. Wong, and C. H. Wu. Accomplishments and challenges in literature data mining for biology. Bioinformatics, 18(12):1553--1561, 2002.
|
 |
11
|
S. Sathiya Keerthi , Chong Jin Ong , Keng Boon Siah , David B. L. Lim , Wei Chu , Min Shi , David S. Edwin , Rakesh Menon , Lixiang Shen , Jonathan Y. K. Lim , Han Tong Loh, A machine learning approach for the curation of biomedical literature: KDD Cup 2002 (task 1), ACM SIGKDD Explorations Newsletter, v.4 n.2, p.93-94, December 2002
[doi> 10.1145/772862.772875]
|
| |
12
|
J. F. Kenney and E. S. Keeping. Mathematics of Statistics, chapter Quartiles, pages 35--37. Princeton, NJ: Van Nostrand, 1962.
|
| |
13
|
|
| |
14
|
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
|
| |
15
|
MEDLINE. PubMed database at the National Library of Medicine. www.ncbi.nih.gov/PubMed.
|
| |
16
|
MeSH: Medical Subject Headings. www.nlm.nih.gov/mesh/meshhome.html.
|
| |
17
|
|
 |
18
|
Yizhar Regev , Michal Finkelstein-Landau , Ronen Feldman , Maya Gorodetsky , Xin Zheng , Samuel Levy , Rosane Charlab , Charles Lawrence , Ross A. Lippert , Qing Zhang , Hagit Shatkay, Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1), ACM SIGKDD Explorations Newsletter, v.4 n.2, p.90-92, December 2002
[doi> 10.1145/772862.772874]
|
| |
19
|
G. Rubin. Around the genomes: The drosophila genome project. Genome Research, 6:71--79, 1996.
|
| |
20
|
|
 |
21
|
|
|