| CEBBIP: a parser of bibliographic information in chinese electronic books |
| Full text |
Pdf
(489 KB)
|
Source
|
International Conference on Digital Libraries
archive
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Austin, TX, USA
Pages 73-76
Year of Publication: 2009
ISBN:978-1-60558-322-8
|
|
Authors
|
|
Liangcai Gao
|
Institute of Computer Science and Technology of Peking University, Beijing, China
|
|
Zhi Tang
|
Institute of Computer Science and Technology of Peking University, Beijing, China
|
|
Xiaofan Lin
|
Vobile Inc., Santa Clara, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 51, Citation Count: 0
|
|
|
ABSTRACT
Bibliographic information is essential for many digital library applications, such as citation analysis, academic searching and topic discovery. And bibliographic data extraction has attracted a great deal of attention in recent years. In this paper, we address the problem of automatic extraction of bibliographic data in Chinese electronic book and propose a tool called CEBBIP* for the task, which includes three main systems: data preprocessing, data parsing and data postprocessing. In the data preprocessing system, the tool adopts a rules-based method to locate citation data in a book and to segment citation data into citation strings of individual referencing literature. And a learning-based approach, Conditional Random Fields (CRF), is employed to parse citation strings in the data parsing system. Finally, the tool takes advantage of document intrinsic local format consistency to enhance citation data segmentation and parsing through clustering techniques. CEBBIP has been used in a commercial E-book production system. Experimental results show that CEBBIP's precision rate is very high. More specially, adopting the document intrinsic local format consistency obviously improves the citation data segmenting and parsing accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Kurt D. Bollacker , Steve Lawrence , C. Lee Giles, CiteSeer: an autonous Web agent for automatic retrieval and identification of interesting publications, Proceedings of the second international conference on Autonomous agents, p.116-123, May 10-13, 1998, Minneapolis, Minnesota, United States
[doi> 10.1145/280765.280786]
|
| |
3
|
|
| |
4
|
Min-Yuh Day , Richard Tzong-Han Tsai , Cheng-Lung Sung , Chiu-Chen Hsieh , Cheng-Wei Lee , Shih-Hung Wu , Kun-Pin Wu , Chorng-Shyong Ong , Wen-Lian Hsu, Reference metadata extraction using a hierarchical knowledge representation framework, Decision Support Systems, v.43 n.1, p.152-167, February, 2007
[doi> 10.1016/j.dss.2006.08.006]
|
 |
5
|
Eli Cortez , Altigran S. da Silva , Marcos André Gonçalves , Filipe Mesquita , Edleno S. de Moura, FLUX-CIM: flexible unsupervised extraction of citation metadata, Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
[doi> 10.1145/1255175.1255219]
|
| |
6
|
Gao, L., and Tang, Z., "A mixed approach to book splitting", Proc. of SPIE Conference on Document Recognition and Retrieval XV, San Jose, 2008, p0B--1/0B--8.
|
| |
7
|
Hui Han , C. Lee Giles , Eren Manavoglu , Hongyuan Zha , Zhenyue Zhang , Edward A. Fox, Automatic document metadata extraction using support vector machines, Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, May 27-31, 2003, Houston, Texas
|
| |
8
|
|
| |
9
|
Huang, A., Ho, J. M., Kao, H. Y., and Lin, S. H. 2004. Extracting citation metadata from online publication lists using BLAST. In Proceedings of the PAKDD '04 (Sydney, Australia, May 26--28, 2004). Springer, Berlin, vol. 3056, 539--548.
|
 |
10
|
|
| |
11
|
Li, C., Zhang, M., Deng, Z., Yang D., and Tang, S., "Automatic Metadata Extraction for Scientific Documents", Computer Engineering and Application, 2002, Vol 21, 189--191,235.
|
| |
12
|
Peng, F., and McCallum, A. 2004. Accurate information extraction from research papers using conditional random fields. In Proceeding of the HLTNAACL '04 (Boston, MA, USA, May 2 -- 7, 2004). pp. 329--336.
|
| |
13
|
Seymore, K., McCallum, A., and Rosenfeld, R. 1999. Learning hidden Markov model structure for information extraction. In Proceeding of the AAAI '99 (Orlando, FL, USA, July 18--22, 1999). 37--42.
|
| |
14
|
|
| |
15
|
|
|