| Learning metadata from the evidence in an on-line citation matching scheme |
| Full text |
Pdf
(431 KB)
|
| Source
|
International Conference on Digital Libraries
archive
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Chapel Hill, NC, USA
SESSION: Information retrieval 2
table of contents
Pages: 276 - 285
Year of Publication: 2006
ISBN:1-59593-354-9
|
|
Authors
|
|
Isaac G. Councill
|
The Pennsylvania State University, University Park, PA
|
|
Huajing Li
|
The Pennsylvania State University, University Park, PA
|
|
Ziming Zhuang
|
The Pennsylvania State University, University Park, PA
|
|
Sandip Debnath
|
The Pennsylvania State University, University Park, PA
|
|
Levent Bolelli
|
The Pennsylvania State University, University Park, PA
|
|
Wang Chien Lee
|
The Pennsylvania State University, University Park, PA
|
|
Anand Sivasubramaniam
|
The Pennsylvania State University, University Park, PA
|
|
C. Lee Giles
|
The Pennsylvania State University, University Park, PA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 54, Citation Count: 1
|
|
|
ABSTRACT
Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for scientific literature such as CiteSeer and Google Scholar. Although several solutions have been offered for citation matching in large bibliographic databases, these solutions typically require expensive batch clustering operations that must be run offline. Large digital libraries containing citation information can reduce maintenance costs and provide new services through efficient online processing of citation data, resolving document citation relationships as new records become available. Additionally, information found in citations can be used to supplement document metadata, requiring the generation of a canonical citation record from merging variant citation subfields into a unified "best guess" from which to draw information. Citation information must be merged with other information sources in order to provide a complete document record. This paper outlines a system and algorithms for online citation matching and canonical metadata generation. A Bayesian framework is employed to build the ideal citation record for a document that carries the added advantages of fusing information from disparate sources and increasing system resilience to erroneous data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Apache Lucene. Apache Software Foundation, http://lucene.apache.org/java/docs/index.html, 2005.
|
| |
2
|
|
| |
3
|
|
| |
4
|
Culotta, A. and McCallum, A. Confidence Estimation for Information Extraction. In Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004.
|
| |
5
|
Egghe, L. and Rousseau, R. Co-citation, bibliographic coupling and a characterization of lattice citation networks. Scientometrics, 55, Number 3, 2002, 349--361.
|
| |
6
|
Garfield, E. Quantitative measures of communication in science. Science 144, 1964, 649--654.
|
 |
7
|
C. Lee Giles , Kurt D. Bollacker , Steve Lawrence, CiteSeer: an automatic citation indexing system, Proceedings of the third ACM conference on Digital libraries, p.89-98, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276685]
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
Steve Lawrence , Kurt Bollacker , C. Lee Giles, Distributed error correction, Proceedings of the fourth ACM conference on Digital libraries, p.232, August 11-14, 1999, Berkeley, California, United States
[doi> 10.1145/313238.313390]
|
 |
12
|
Steve Lawrence , Frans Coetzee , Eric Glover , Gary Flake , David Pennock , Bob Krovetz , Finn Nielsen , Andries Kruger , Lee Giles, Persistence of information on the web: analyzing citations contained in research articles, Proceedings of the ninth international conference on Information and knowledge management, p.235-242, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354824]
|
| |
13
|
|
| |
14
|
Marthi, B., Milch, B., and Russell, S. First-Order Probabilistic Models for Information Extraction. IJCAI 2003 Workshop on Learning Statistical Models from Relational Data, Acapulco, Mexico, August 2003.
|
 |
15
|
Andrew McCallum , Kamal Nigam , Lyle H. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.169-178, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347123]
|
| |
16
|
Pasula, H., Marthi, B., Milch, B., and Russell, S., and Shpitser, I. Identity uncertainty and citation matching. Advances in Neural Information Processing (NIPS), 2003.
|
| |
17
|
|
 |
18
|
Yves Petinot , C. Lee Giles , Vivek Bhatnagar , Pradeep B. Teregowda , Hui Han , Isaac Councill, A service-oriented architecture for digital libraries, Proceedings of the 2nd international conference on Service oriented computing, November 15-19, 2004, New York, NY, USA
[doi> 10.1145/1035167.1035205]
|
| |
19
|
Sarawagi, S., Vydiswaran, V., Srinivasan, S., and Bhudhia, K. Resolving citations in a paper repository. In Proc SIGKDD, 5, Number 2, 2003, 156--157.
|
| |
20
|
Ben Wellner , Andrew McCallum , Fuchun Peng , Michael Hay, An integrated, conditional model of information extraction and coreference with application to citation matching, Proceedings of the 20th conference on Uncertainty in artificial intelligence, p.593-601, July 07-11, 2004, Banff, Canada
|
|