ACM Home Page
Please provide us with feedback. Feedback
Developing practical automatic metadata assignment and evaluation tools for internet resources
Full text PdfPdf (306 KB)
Source International Conference on Digital Libraries archive
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries table of contents
Denver, CO, USA
SESSION: Tools & techniques track: applying machine learning to collection development table of contents
Pages: 291 - 300  
Year of Publication: 2005
ISBN:1-58113-876-8
Author
Gordon W. Paynter  The INFOMINE Project, Riverside, CA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 121,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1065385.1065454
What is a DOI?

ABSTRACT

This paper describes the development of practical automatic metadata assignment tools to support automatic record creation for virtual libraries, metadata repositories and digital libraries, with particular reference to library-standard metadata. The development process is incremental in nature, and depends upon an automatic metadata evaluation tool to objectively measure its progress. The evaluation tool is based on and informed by the metadata created and maintained by librarian experts at the INFOMINE Project, and uses different metrics to evaluate different metadata fields. In this paper, we describe the form and function of common metadata fields, and identify appropriate performance measures for these fields. The automatic metadata assignment tools in the iVia virtual library software are described, and their performance is measured. Finally, we discuss the limitations of automatic metadata evaluation, and cases where we choose to ignore its evidence in favor of human judgment.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Cavnar, W. B. and J. M. Trenkle, N-Gram-Based Text Categorization. In Proc. Third Annual Symposium on Document Analysis and Information Retrieval. UNLV Publications/Reprographics, Las Vegas, NV, 1994, 161--175.
 
3
Chakrabarti, S., Roy, S. and Soundalgekar, M. V. Fast and Accurate Text Classification via Multiple Linear Discriminant Projections. In Proc. VLDB. 2002, 658--669
 
4
Chan, L. M. Inter-indexer consistency in subject cataloging. Information Technology and Libraries, 8, 4. 1989, 349--358.
 
5
 
6
Dublin Core Metadata Initiative Dublin Core Metadata Element Set, Version 1.1: Reference Description. 1995-2005. http://dublincore.org/documents/dces/
 
7
 
8
Frank, E., Hall, M., and Pfahringer B. Locally Weighted Naive Bayes. In Proc. Conf. Uncertainty in Artificial Intelligence (UAI 2003). 2003, 249--256.
 
9
 
10
Godby, C. J. & Stuler, J. The library of congress classification as a knowledge base for automatic subject categorization. In Subject Retrieval in a Network Environment: Papers Presented at an IFLA Satellite Meeting Sponsored by the IFLA Section on Classification and Indexing and IFLA Section of Information Technology, (Dublin, OH.). OCLC, 2001, 14--16.
 
11
Guy, M. Powell, A. and Day, A. Improving the Quality of Metadata in Eprint Archives. Ariadne 38, January 2004. <http://www.ariadne.ac.uk/issue38/guy/>
 
12
 
13
Humphreys J. B. K. PhraseRate: An HTML Keyphrase Extractor. Technical report, University of California, Riverside. June 2002. http://infomine.ucr.edu/projects/Keith_Humphrey/PhraseRate/phraserate.pdf <http://i/>
 
14
Internet Assigned Numbers Authority. MIME Media Types. http://www.iana.org/assignments/media-types/
 
15
16
 
17
 
18
Jones S. and Paynter G. W. An evaluation of document keyphrase sets. Journal of Digital Information, 4, 1. 2003. http://jodi.ecs.soton.ac.uk/Articles/v04/i01/Jones/
 
19
Kedzierski, A. Artur's Auto Annotator. Masters Thesis, Department of Computer Science, University of California, Riverside. 2002.
 
20
 
21
Larson, R. R. Experiments in automatic library of congress classification. JASIS, 43, 2. 1992, 130--148.
 
22
Library of Congress. SuperLCCS: Library of Congress Classification Schedules combined with additions and changes. Gale Research Inc. 1986-2001.
 
23
Library of Congress Subject Cataloging Division. Library of Congress Subject Headings (24 Ed.). Library of Congress. 2001.
 
24
 
25
Lovins J. B. Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11. 1968, 22--31.
 
26
 
27
Mitchell S., Mooney M., Mason J., Paynter G. W., Ruscheinski J., Kedzierski A., Humphreys K. iVia Open Source Virtual Library System. D-Lib Magazine 9, 1. January 2003. http://www.dlib.org/dlib/january03/mitchell/01mitchell.html
28
 
29
 
30
Turney, P. D. Coherent Keyphrase Extraction via Web Mining. In Proc. IJCAI. 2003, 434--442.
 
31
 
32
Witten, I. H. and Frank, E. Data Mining. Morgan Kaufmann, San Francisco, CA. 2000.
33