|
ABSTRACT
Finding biological entities (such as genes or proteins) that satisfy certain conditions from texts is an important and challenging task in biomedical information retrieval and text mining. It is essential for many biomedical applications, such as drug discovery which normally requires collecting existing scientific facts from documents. This paper presents an effective IR system for this task, in which 1) domain knowledge is incorporated to improve retrieval effectiveness; 2) query expansion with related concepts on multiple semantic levels is employed; 3) a gene symbol disambiguation technique is implemented. We evaluated these techniques and examined two different concept-based IR models. Experiments based upon the proposed framework yield significant improvement (22% for automatic and 16.7% for non-automatic) over the best reported results of passage retrieval in the Genomics track of TREC 2007.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. R. Aronson and T. C. Rindflesch. Query expansion using the umls metathesaurus. In Proc AMIA Annu Fall Symp., pages 485--489. American Medical Informatics Association, Oct. 1997.
|
| |
2
|
S. buttcher, C. L. A. Clarke, and G. V. Cormack. Domain-specific synonym expansion and validation for biomedical information retrieval. In the Thirteenth Text REtrieval Conference (TREC 2004). National Institute of Standards and Technology, November 2004.
|
| |
3
|
J. T. Chang, H. Schütze, and R. B. Altman. Creating an online dictionary of abbreviations from medline. J Am Med Inform Assoc., 9(6):612--620, November 2002.
|
| |
4
|
H. Chen and B. M. Sharp. Content-rich biological network constructed by mining pubmed abstracts. BMC Bioinformatics, 5(147), October 2004.
|
| |
5
|
ClusterMed. Vivísimo clustermed. http://clustermed.info/.
|
| |
6
|
H. T. Dang, D. Kelly, and J. Lin. Overview of the trec 2007 question answering track. In the Sixteenth Text REtrieval Conference (TREC 2007). National Institute of Standards and Technology, November 2007.
|
| |
7
|
D. Demner-Fushman, S. M. Humphrey, N. C. Ide, R. F. Loane, J. G. Mork, M. E. Ruiz, P. Ruch, L. H. Smith, J. W. Wilbur, and A. R. Aronson. Combining resources to find answers to biomedical questions. In the Sixteenth Text REtrieval Conference (TREC 2007). National Institute of Standards and Technology, November 2007.
|
| |
8
|
|
| |
9
|
A. Doms and M. Schroeder. Gopubmed: Exploring pubmed with the gene ontology. Nucleic Acids Res., 21(Web Server issue):W783--W786, April 2005.
|
| |
10
|
S. M. Douglas, G. T. Montelione, and M. Gerstein. Pubnet: a flexible system for visualizing literature derived networks. Genome Biol., 6(9):R80, July 2005.
|
| |
11
|
A. D. Eaton. Hubmed: a web-based biomedical literature search interface. Nucleic Acids Res., 34(Web Server issue):W745--W747, January 2006.
|
| |
12
|
P. Fontelo, F. Liu, and M. Ackerman. askmedline: a free-text, natural language query tool for medline/pubmed. BMC Medical Informatics and Decision Making, 5(5), March 2005.
|
| |
13
|
T. Goetz and C.-W. von der Lieth. Pubfinder: a tool for improving retrieval rate of relevant pubmed abstracts. Nucleic Acids Res., 33(Web Server issue):W774--W778, July 2005.
|
| |
14
|
W. Hersh, A. Cohen, L. Ruslen, and P. Roberts. Trec 2007 genomics track overview. In the Sixteenth Text REtrieval Conference (TREC 2007). National Institute of Standards and Technology, November 2007.
|
| |
15
|
W. Hersh, S. Price, and L. Donohoe. Assessing thesaurus-based query expansion using the umls metathesaurus. In Proc AMIA Annu Fall Symp., pages 344--348. American Medical Informatics Association, November 2000.
|
| |
16
|
R. Hoffmann and A. Valencia. A gene network for navigating the literature. Nature Genetics, 36(7):664--664, July 2004.
|
| |
17
|
ISI-knowledge. Isi knowledge. http://isiknowledge.com/.
|
| |
18
|
T.-K. Jenssen, A. Lagreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28(1):21--28, May 2001.
|
| |
19
|
V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707--710, 1966.
|
| |
20
|
|
 |
21
|
|
| |
22
|
D. A. B. Lindberg, B. L. Humphreys, and A. T. McCray. The unified medical language system. Methods of Information in Medicine, 32(4):281--291, August 1993.
|
 |
23
|
|
| |
24
|
U. Mudunuri, R. Stephens, D. Bruining, D. Liu, and F. J. Lebeda. botxminer: mining biomedical literature with a new web-based application. Nucleic Acids Res., 34(Web Server issue):W748--W752, March 2006.
|
| |
25
|
H.-M. Muller, E. E. Kenny, and P. W. Sternberg. Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol., 2(11):e309, Nov. 2004.
|
| |
26
|
C. Perez-Iratxeta, P. Borka, and M. A. Andrade. Xplormed: a tool for exploring medline abstracts. Trends Biochem Sci., 26(9):573--575, September 2001.
|
| |
27
|
M. V. Plikus, Z. Zhang, and C.-M. Chuong. Pubfocus: semantic medline/pubmed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm. BMC Bioinformatics, 7(424), October 2006.
|
| |
28
|
Raf M. Podowski , John G. Cleary , Nicholas T. Goncharoff , Gregory Amoutzias , William S. Hayes, AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names, Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, p.415-424, August 16-19, 2004
[doi> 10.1109/CSB.2004.45]
|
| |
29
|
S. E. Robertson and S. Walker. Okapi/keenbow at trec-8. In the Eighth Text REtrieval Conference (TREC 2007). National Institute of Standards and Technology, November 2000.
|
| |
30
|
B. J. A. Schijvenaars, B. Mons, M. Weeber, M. J. Schuemie, E. M. van Mulligen, H. M. Wain, and J. A. Kors. Thesaurus-based disambiguation of gene symbols. BMC Bioinformatics, 6(149), October 2005.
|
| |
31
|
N. R. Smalheiser, W. Zhou, and V. I. Torvik. Anne o'tate: A tool to support user-driven summarization, drill-down and browsing of pubmed search results. Journal of Biomedical Discovery and Collaboration, 3(2), February 2008.
|
| |
32
|
C. A. Sneiderman, D. Demner-Fushman, M. Fiszman, N. C. Ide, and T. C. Rindflesch. Knowledge-based methods to help clinicians find answers in medline. Journal of American Medical Information Assoc., 14(6):772--780, July 2007.
|
| |
33
|
H. Tenner, G. R. Thurnayr, and R. Thurmayr. Data mining with meva in medline. In the 4th International Symposium on Medical Data Analysis (ISMDA 2003), pages 39--46, October 2003.
|
| |
34
|
|
| |
35
|
|
| |
36
|
Hua Xu , Jung-Wei Fan , George Hripcsak , Eneida A. Mendonça , Marianthi Markatou , Carol Friedman, Gene symbol disambiguation using knowledge-based profiles, Bioinformatics, v.23 n.8, p.1015-1022, February 2007
[doi> 10.1093/bioinformatics/btm056]
|
 |
37
|
|
| |
38
|
|
 |
39
|
Wei Zhou , Clement Yu , Neil Smalheiser , Vetle Torvik , Jie Hong, Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277853]
|
|