|
ABSTRACT
In previous work, we have shown that using terms from around citations in citing papers to index the cited paper, in addition to the cited paper's own terms, can improve retrieval effectiveness. Now, we investigate how to select text from around the citations in order to extract good index terms. We compare the retrieval effectiveness that results from a range of contexts around the citations, including no context, the entire citing paper, some fixed windows and several variations with linguistic motivations. We conclude with an analysis of the benefits of more complex, linguistically motivated methods for extracting citation index terms, over using a fixed window of terms. We speculate that there might be some advantage to using computational linguistic techniques for this task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Bradshaw. Reference directed indexing: Redeeming relevance for subject search in citation indexes. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries (ECDL), pages 499--510, 2003.
|
| |
2
|
|
| |
3
|
E. Briscoe and J. Carroll. Robust accurate statistical annotation of general text. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), pages 1499--1504, 2002.
|
| |
4
|
C. Cleverdon, J. Mills, and M. Keen. Factors determining the performance of indexing sytems, volume 1. design. Technical report, ASLIB Cranfield Project, 1966.
|
| |
5
|
|
 |
6
|
|
| |
7
|
D. Hawking and N. Craswell. The very large collection and web tracks. In E. M. Voorhees and D. K. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, chapter 9. MIT Press, 2005.
|
| |
8
|
W. Hersh and R. T. Bhupatiraju. Trec genomics track overview. In Proceedings of the Text REtrieval Conference (TREC), pages 14--23, 2003.
|
| |
9
|
W. Hersh, R. T. Bhupatiraju, L. Ross, P. Johnson, A. M. Cohen, and D. F. Kraemer. Trec 2004 genomics track overview. In Proceedings of the Text REtrieval Conference (TREC), 2004.
|
| |
10
|
W. Hersh, A. M. Cohen, P. Roberts, and H. K. Rekapilli. Trec 2006 genomics track overview. In Proceedings of the Text REtrieval Conference (TREC), 2006.
|
| |
11
|
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.
|
| |
12
|
M. Kluck. The GIRT data in the evaluation of CLIR systems - from 1997 until 2003. In Proceedings of Cross-Language Evaluation Forum (CLEF), pages 376--390, 2003.
|
| |
13
|
O. McBryan. GENVL and WWWW: Tools for taming the web. In Proceedings of the World Wide Web Conference (WWW), 1994.
|
| |
14
|
E. Meij and M. de Rijke. Using prior information derived from citations in literature search. In Proceedings of the International Conference on Recherche d'Information Assistée par Ordinateur (RIAO), 2007.
|
| |
15
|
|
| |
16
|
H. Nanba and M. Okumura. Automatic detection of survey articles. In Proceedings of Research and Advanced Technology for Digital Libraries (ECDL), pages 391--401, 2005.
|
| |
17
|
J. O'Connor. Citing statements: Computer recognition and use to improve retrieval. Information Processing and Management, 18(3):125--131, 1982.
|
| |
18
|
J. O'Connor. Biomedical citing statements: Computer recognition and use to aid full-text retrieval. Information Processing and Management, 19:361--368, 1983.
|
| |
19
|
B. Powley and R. Dale. Evidence-based information extraction for high accuracy citation and author name identification. In Proceedings of the International Conference on Recherche d'Information Assistée par Ordinateur (RIAO), 2007.
|
| |
20
|
A. Ritchie, S. Robertson, and S. Teufel. Creating a test collection: Relevance judgements of cited & non-cited papers. In Proceedings of the International Conference on Recherche d'Information Assistéée par Ordinateur (RIAO), 2007.
|
| |
21
|
|
| |
22
|
A. Ritchie, S. Teufel, and S. Robertson. How to find better index terms through citations. In Proceedings of COLING/ACL Workshop on How Can Computational Linguistics Improve Information Retrieval?, 2006.
|
| |
23
|
A. Ritchie, S. Teufel, and S. Robertson. Using terms from citations for IR: Some first results. In Proceedings of the European Conference for Information Retrieval (ECIR), 2007.
|
| |
24
|
J. Schneider. Verification of bibliometric methods' applicability for thesaurus construction. >PhD thesis, Department of Information Studies, Royal School of Library and Information Science, 2004.
|
| |
25
|
A. S. Schwartz and M. Hearst. Summarizing key concepts using citation sentences. In Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, pages 134--135, 2006.
|
| |
26
|
H. Small. Co-citation in the scientific literature: A new measurement of the relationship between two documents. Journal of the American Society of Information Science, 24(4):265--269, 1973.
|
| |
27
|
H. Small. Citation context analysis. In B. Dervin and M. J. Voigt, editors, Progress in Communication Sciences, volume 3, pages 287--310. Ablex Publishing, 1982.
|
 |
28
|
|
| |
29
|
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: a language-model based search engine for complex queries. Technical report, University of Massachusetts, 2005.
|
| |
30
|
S. Teufel, A. Siddharthan, and D. Tidhar. Automatic classification of citation function. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 103--110, 2006.
|
|