| FLUX-CIM: flexible unsupervised extraction of citation metadata |
| Full text |
Pdf
(412 KB)
|
Source
|
International Conference on Digital Libraries
archive
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Vancouver, BC, Canada
SESSION: Information extraction 3
table of contents
Pages: 215 - 224
Year of Publication: 2007
ISBN:978-1-59593-644-8
|
|
Authors
|
|
Eli Cortez
|
Universidade Federal do Amazonas, Manaus, Brazil
|
|
Altigran S. da Silva
|
Universidade Federal do Amazonas, Manaus, Brazil
|
|
Marcos André Gonçalves
|
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
|
|
Filipe Mesquita
|
Universidade Federal do Amazonas, Manaus, Brazil
|
|
Edleno S. de Moura
|
Universidade Federal do Amazonas, Manaus, Brazil
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 110, Citation Count: 2
|
|
|
ABSTRACT
In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
Pável Calado , Marco Cristo , Marcos André Gonçalves , Edleno S. de Moura , Berthier Ribeiro-Neto , Nivio Ziviani, Link-based similarity measures for the classification of Web documents, Journal of the American Society for Information Science and Technology, v.57 n.2, p.208-221, January 2006
[doi> 10.1002/asi.v57:2]
|
 |
4
|
Thierson Couto , Marco Cristo , Marcos André Gonçalves , Pável Calado , Nivio Ziviani , Edleno Moura , Berthier Ribeiro-Neto, A comparative study of citations and links in document classification, Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, June 11-15, 2006, Chapel Hill, NC, USA
[doi> 10.1145/1141753.1141766]
|
| |
5
|
|
| |
6
|
M. -Y. Day, T. -H. Tsai, C. -L. Sung, C. -W. Lee, S. -H.Wu, C. -S. Ong, and W. -L. Hsu. A knowledge-based approach to citation extraction. In IRI '05: Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, pages 50--55, New York, NY, USA, 2005. IEEE Systems, Man, and Cybernetics Society.
|
| |
7
|
D. W. Embley , D. M. Campbell , Y. S. Jiang , S. W. Liddle , D. W. Lonsdale , Y.---K. Ng , R. D. Smith, Conceptual-model-based data extraction from multiple-record Web pages, Data & Knowledge Engineering, v.31 n.3, p.227-251, Nov. 1999
[doi> 10.1016/S0169-023X(99)00027-0]
|
| |
8
|
|
| |
9
|
M. A. Gonçalves, B. L. Moreira, E. A. Fox, and L. T. Watson. What is a good digital library? - defining aquality model for digital libraries. To appear in Information Processing and Management, 2007.
|
| |
10
|
Hui Han , C. Lee Giles , Eren Manavoglu , Hongyuan Zha , Zhenyue Zhang , Edward A. Fox, Automatic document metadata extraction using support vector machines, Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, May 27-31, 2003, Houston, Texas
|
| |
11
|
|
 |
12
|
Yunhua Hu , Hang Li , Yunbo Cao , Dmitriy Meyerzon , Qinghua Zheng, Automatic extraction of titles from general documents using machine learning, Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2005, Denver, CO, USA
[doi> 10.1145/1065385.1065418]
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
D. Lee, J. Kang, P. Mitra, C. L. Giles, and B.-W. On. Are your citations clean? new scenarios and challenges in maintaining digital libraries. To appear in Communications of the ACM, 2007.
|
 |
18
|
|
| |
19
|
Filipe Mesquita , Altigran S. da Silva , Edleno S. de Moura , Pável Calado , Alberto H. F. Laender, LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces, Information Processing and Management: an International Journal, v.43 n.4, p.983-1004, July, 2007
[doi> 10.1016/j.ipm.2006.09.018]
|
| |
20
|
|
 |
21
|
|
 |
22
|
D. C. Reis , P. B. Golgher , A. S. Silva , A. F. Laender, Automatic web news extraction using tree edit distance, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988740]
|
| |
23
|
|
 |
24
|
|
|