ACM Home Page
Please provide us with feedback. Feedback
FLUX-CIM: flexible unsupervised extraction of citation metadata
Full text PdfPdf (412 KB)
Source
International Conference on Digital Libraries archive
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries table of contents
Vancouver, BC, Canada
SESSION: Information extraction 3 table of contents
Pages: 215 - 224  
Year of Publication: 2007
ISBN:978-1-59593-644-8
Authors
Eli Cortez  Universidade Federal do Amazonas, Manaus, Brazil
Altigran S. da Silva  Universidade Federal do Amazonas, Manaus, Brazil
Marcos André Gonçalves  Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Filipe Mesquita  Universidade Federal do Amazonas, Manaus, Brazil
Edleno S. de Moura  Universidade Federal do Amazonas, Manaus, Brazil
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 111,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1255175.1255219
What is a DOI?

ABSTRACT

In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
 
6
M. -Y. Day, T. -H. Tsai, C. -L. Sung, C. -W. Lee, S. -H.Wu, C. -S. Ong, and W. -L. Hsu. A knowledge-based approach to citation extraction. In IRI '05: Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, pages 50--55, New York, NY, USA, 2005. IEEE Systems, Man, and Cybernetics Society.
 
7
 
8
 
9
M. A. Gonçalves, B. L. Moreira, E. A. Fox, and L. T. Watson. What is a good digital library? - defining aquality model for digital libraries. To appear in Information Processing and Management, 2007.
 
10
 
11
12
 
13
 
14
15
 
16
 
17
D. Lee, J. Kang, P. Mitra, C. L. Giles, and B.-W. On. Are your citations clean? new scenarios and challenges in maintaining digital libraries. To appear in Communications of the ACM, 2007.
18
 
19
 
20
21
22
 
23
24


Collaborative Colleagues:
Eli Cortez: colleagues
Altigran S. da Silva: colleagues
Marcos André Gonçalves: colleagues
Filipe Mesquita: colleagues
Edleno S. de Moura: colleagues