|
ABSTRACT
We present a passage relevance model for integrating semantic and statistical evidence of biomedical concepts and topics using a probabilistic graphical model. Component models of topics, concepts, terms, and document are represented as potential functions within a Markov Random Field. The probability of a passage being relevant to a biologist's information need is represented as the joint distribution across all potential functions. Relevance model feedback of top ranked passages is used to improve distributional estimates of concepts and topics in context, and a dimensional indexing strategy is used for efficient aggregation of concept and term statistics. By integrating multiple sources of evidence including dependencies between topics, concepts, and terms, we seek to improve genomics literature passage retrieval precision. Using this model, we are able to demonstrate statistically significant improvements in retrieval precision using a large genomics literature corpus.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Azzopardi, L., Girolami, M., & van Rijsbergen, C. J. (2004). Topic Based Language Models for ad hoc Information Retrieval. Proceedings of the International Joint Conference on Neural Networks.
|
| |
2
|
Blei, D., Jordan, M., & Ng, A. (2003). Hierarchical Bayesian models for applications in information retrieval, In: Bernardo, J. M., Bayarri, M., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., & West, M. (Eds.), Bayesian Statistics 7.
|
| |
3
|
|
 |
4
|
W. Bruce Croft , Howard R. Turtle , David D. Lewis, The use of phrases and structured queries in information retrieval, Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, p.32-45, October 13-16, 1991, Chicago, Illinois, United States
[doi> 10.1145/122860.122864]
|
| |
5
|
Demner-Fushman, D., Humphrey, S. M., Ide, N. C., Loane, R. F., Mork, J. G., Ruiz, M.E., Smith, L. H., Wilbur, W. J., Aronson, A. R., & Ruch, P. (2007). Combining Resources to Find Answers to Biomedical Questions. The Sixteenth Text REtrieval Conference Proceedings.
|
| |
6
|
Firth, J. R. (1957). A Synopsis of Linguistic Theory, 1930-1955. Studies in Linguistic Analysis. Oxford: Blackwell, 1--32.
|
| |
7
|
Jim Gray , Surajit Chaudhuri , Adam Bosworth , Andrew Layman , Don Reichart , Murali Venkatrao , Frank Pellow , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, v.1 n.1, p.29-53, 1997
[doi> 10.1023/A:1009726021843]
|
| |
8
|
|
| |
9
|
Hersh W., et al. (2007). TREC 2007 Genomics Track Overview. The Sixteenth Text REtrieval Conference Proceedings.
|
 |
10
|
|
| |
11
|
Ittycheriah, A., & Roukos, S. (2001). IBM's Statistical Question Answering System. TREC-11.
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14,130--137.
|
| |
19
|
Rijsbergen, C. J. (1977). A theoretical basis for using co-occurrence data in information retrieval. Journal of Documentation, 33(2):106--119.
|
| |
20
|
Robertson, S. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4):294--303.
|
| |
21
|
Schwartz, A., & Hearst, M. (2003). A simple algorithm for identifying abbreviation definitions in biomedical text. Pacific Symposium on Biocomputing.
|
| |
22
|
Steyvers, M. (2006). Probabilistic Topic Models. In Landauer, T., McNamara, D., Dennis, S., & Kintch W. (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum.
|
 |
23
|
Stefanie Tellex , Boris Katz , Jimmy Lin , Aaron Fernandes , Gregory Marton, Quantitative evaluation of passage retrieval algorithms for question answering, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860445]
|
| |
24
|
Urbain, J., Goharian, N., & Frieder, O. (2006). IIT TREC-2006: Genomics Track. Proceedings of the Fifteenth Text REtrieval Conference.
|
| |
25
|
Urbain, J., Goharian, N., & Frieder, O. (2007a, October). Combining Semantics, Context, and Statistical Evidence in Genomics Literature Search. IEEE 7th International Symposium on BioInformatics and BioEngineering.
|
| |
26
|
Urbain, J., Goharian, N., & Frieder, O. (2007b, November). IIT TREC 2007 Genomics Track: Using Concept-Based Semantics in Context for Genomics Literature Passage Retrieval. The Sixteenth Text REtrieval Conference (TREC 2007) Conference Proceedings.
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
Zhou, W., & Yu, C. (2007, November). TREC Genomics Track at UIC. The Sixteenth Text REtrieval Conference Proceedings.
|
|