| Unsupervised semantic markup of literature for biodiversity digital libraries |
| Full text |
Pdf
(282 KB)
|
Source
|
International Conference on Digital Libraries
archive
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Pittsburgh PA, PA, USA
SESSION: Automatic tools for digital libraries
table of contents
Pages 25-28
Year of Publication: 2008
ISBN:978-1-59593-998-2
|
|
Author
|
|
Hong Cui
|
University of Arizona, Tucson, AZ, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 75, Citation Count: 1
|
|
|
ABSTRACT
This paper reports the further development of machine learning techniques for semantic markup of biodiversity literature, especially morphological descriptions of living organisms such as those hosted at efloras.org and algaebase.org. Syntactic parsing and supervised machine learning techniques have been explored by earlier research. Limitations of these techniques promoted our investigation of an unsupervised learning approach that combines the strength of earlier techniques and avoids the limitations. Semantic markup at the organ and character levels is discussed. Research on semantic markup of natural heritage literature has direct impact on the development of semantic-based access in biodiversity digital libraries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Kirkup, D., Malcolm, P., Christian, G., & Paton, A. (2005). Towards a digital African Flora. Taxon, 54(2). 457--466.
|
| |
4
|
Koning, D., Sarkar, I. N., & Moritz, T (2005). TaxonGrad: Extracting Taxonomic Names from Text. Biodiversity Informatics. 2. 79--82.
|
| |
5
|
Lydon, S. J., Wood, M. M., Huxley, R., & Sutton, D.(2003). Data Patterns in Multiple Botanical Descriptions: implications for automatic processing of legacy data. Systematics and Biodiversity 1(2). 151--157.
|
| |
6
|
Sautter, G., Agosti, D., & Bööhm, K. (2006). A Combining Approach to Find All Taxon Names(FAT). Biodiversity Informatics. 3, 46--58.
|
| |
7
|
Taylor, A.(1995). Extracting Knowledge from Biological Descriptions. Proceedings of 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. pp. 114--119.
|
| |
8
|
Vanel, J.-M. (2004). Worldwide Botanical Knowledge Base. Accessed July 5, 2007 from http://wwbota.free.fr/.
|
| |
9
|
Wood, M., Lydon, S., et.al. (2004). Populating a database from parallel texts using ontology-based information extraction. Proceedings of 9th International Conference on Applications of Natural Languages to Information Systems. pp.254--264.
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.7
Digital Libraries
Subjects:
Collection
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.7
Digital Libraries
Subjects:
Systems issues
General Terms:
Algorithms,
Design,
Experimentation,
Performance
Keywords:
biodiversity informatics,
morphological description,
natural heritage literature,
semantic annotation,
semantic markup,
tagging,
unsupervised machine learning,
xml
|