ACM Home Page
Please provide us with feedback. Feedback
Automatically generating high quality metadata by analyzing the document code of common file types
Full text PdfPdf (1.80 MB)
Source
International Conference on Digital Libraries archive
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries table of contents
Austin, TX, USA
SESSION: 1 table of contents
Pages 29-38  
Year of Publication: 2009
ISBN:978-1-60558-322-8
Authors
Lars Fredrik Høimyr Edvardsen  Intelligent Communication AS/The Norwegian University of Science and Technology, Oslo, Norway
Ingeborg Torvik Sølvberg  The Norwegian University of Science and Technology, Trondheim, Norway
Trond Aalberg  The Norwegian University of Science and Technology, Trondheim, Norway
Hallvard Trætteberg  The Norwegian University of Science and Technology, Trondheim, Norway
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 44,   Downloads (12 Months): 112,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1555400.1555406
What is a DOI?

ABSTRACT

A major challenge for content management in intranets and other large scale document storage and retrieval services is the generation of high quality metadata. Manual generation of metadata is resource demanding and is often viewed by collection managers and document authors as inefficient use of their time, and there is a desire for other ways to create the needed metadata. Automatic Metadata Generation (AMG) is methods for generating metadata without manual interaction using computer program(s) to interpret the document and possibly the document context. Current AMG research has been limited to collection of similarly formatted documents. The research presented in this paper expands the field of AMG by presenting an approach that is independent of a common visualization scheme; AMG based on document code analysis. This is done by showing AMG possibilities from Latex, Word and PowerPoint documents and how this approach can significantly increase the quality of the generated metadata. This by avoiding common quality reducing factors as missing completeness, low accuracy, logical consistency and coherence and timeliness by giving AMG algorithms direct access to the user specified intellectual content and the file formatting. This research shows how this AMG approach can be combined with other AMG approaches, drawing on their strengths in order to achieve the desired high quality metadata entities.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Greenberg, J. 2004. Metadata Extraction and Harvesting: A Comparison of Two Automatic Metadata Generation Applications. Journal of Internet Cataloging, 6(4): 59--82.
 
3
Meire, M., Ochoa, X. and Duval, E. 2007. SAmgI: Automatic Metadata Generation v2.0. In Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007, pp. 1195--1204, Chesapeake, VA: AACE
 
4
 
5
Edvardsen, L.F.H., Sølvberg, I.T., Aalberg, T., Trætteberg, H. 2009. Using the structural content of documents to automatically generate quality metadata. Webist 2009, March 23--26, 2009. Springer
 
6
Edvardsen, L.F.H., Sølvberg, I.T. 2007. Metadata challenges in introducing the global IEEE Learning Object metadata (LOM) standard in a local environment. Webist 2007, March 3--6, 2007. Springer
 
7
IEEE LTSC, 2005. IEEE P1484.12.3/D8, 2005-02-22 Draft Standard for Learning Technology -- Extensible Markup Language Schema Definition Language Binding for Learning Object Metadata, WG12: Related Materials, http://ltsc.ieee.org/wg12/files/IEEE_1484_12_03_d8_submitted.pdf
 
8
DCMI, 2008. Dublin Core Metadata Element Set, Version 1.1. Dublin Core Metadata Initiative, http://dublincore.org/documents/dces/
 
9
It's learning. 2009. It's learning. http://www.itslearning.com
 
10
Open Archives Initiative. 2004 Protocol for Metadata Harvesting -- v.2.0. http://www.openarchives.org/OAI/openarchivesprotocol.html
 
11
Seymore, K., McCallum, A. and Rosenfeld, R. 1999. Learning hidden Markov model structure for information extraction. Proc. of AAAI 99 Workshop on Machine Learning for Information Extraction, pages 37--42, 1999.
 
12
Greenstone. 2007. Source only distribution. http://prdownloads.sourceforge.net/greenstone/gsdl-2.72-src.tar.gz (source code inspected)
 
13
Bird, K. and the Jorum Team. 2006. Automated Metadata -- A review of existing and potential metadata automation within Jorum and an overview of other automation systems. 31st March 2006, Version 1.0, Final, Signed off by JISC and Intrallect July 2006.
 
14
Google. 2009. Google. http://www.google.com
 
15
Scirus. 2009. Scirus -- for scientific information. http://www.scirus.com
 
16
Yahoo. 2009. Yahoo!, http://www.yahoo.com
 
17
Singh, A., Boley, H. and Bhavsar, V.C. 2004. LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology. National Research Council and University of New Brunswick, Learning Objects Summit Fredericton, NB, Canada, March 29--30, 2004
18
 
19
Kawtrakul A. and Yingsaeree C. 2005. A Unified Framework for Automatic Metadata Extraction from Electronic Document. Proceedings of IADLC2005 (25--26 August 2005), pp. 71--77.
 
20
Flynn, P., Zhou, L., Maly, K., Zeil, S. and Zubair, M. 2007. Automated Template--Based Metadata Extraction Architecture. ICADL 2007.
21
 
22
Liu, Y., Bai, K., Mitra, P, and Giles, C.L. 2007. TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries. JCDL'07, June 18--23, 2007, Vancouver, Canada, ACM 978-1-59593-644-8/07/0006
 
23
Boguraev, B. and Neff, M. 2000. Lexical Cohesion, Discourse Segmentation and Document Summarization. RIAO.
 
24
LOMGen. 2006. LOMGen. http://www.cs.unb.ca/agentmatcher/LOMGen.html
 
25
Greenberg J., Spurgin, K., Crystal, A., Cronquist, M. and Wilson, A. 2005. Final Report for the AMeGA (Automatic Metadata Generation Applications) Project. UNC School of information and library science.
26
27
 
28
Jenkins, C. and Inman, D. 2001. Server-side Automatic Metadata Generation using Qualified Dublin Core and RDF. 0-7695-1022-1/01, 2001 IEEE
 
29
 
30
Bruce, T.R. and Hillmann, D.I. 2004. The Continuum of Metadata Quality: Defining, Expressing, Exploiting. ALA Editions, In Metadata in Practice, D. Hillmann & E Westbrooks, eds., ISSN: 0-8389-0882-9
 
31
 
32
ACM. 2009. ACM SIG Proceedings Templates, http://www.acm.org/sigs/publications/proceedings-templates

Collaborative Colleagues:
Lars Fredrik Høimyr Edvardsen: colleagues
Ingeborg Torvik Sølvberg: colleagues
Trond Aalberg: colleagues
Hallvard Trætteberg: colleagues