|
ABSTRACT
There are many advantages for Digital Libraries in indexing with classifications or thesauri, but some current disincentive in the lack of flexible retrieval tools that deal with compound descriptors. This paper discusses a matching function for compound descriptors, or multi-concept subject headings, that does not rely on exact matching but incorporates term expansion via thesaurus semantic relationships to produce ranked results that take account of missing and partially matching terms. The matching function is based on a measure of semantic closeness between terms, which has the potential to help with recall problems. The work reported is part of the ongoing FACET project in collaboration with the National Museum of Science and Industry and its collections database. The architecture of the prototype system and its interface are outlined. The matching problem for compound descriptors is reviewed and the FACET implementation described. Results are discussed from scenarios using the faceted Getty Art and Architecture Thesaurus. We argue that automatic traversal of thesaurus relationships can augment the user's browsing possibilities. The techniques can be applied both to unstructured multi-concept subject headings and potentially to more syntactically structured strings. The notion of a focus term is used by the matching function to model AAT modified descriptors (noun phrases). The relevance of the approach to precoordinated indexing and matching faceted strings is discussed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Art and Architecture Thesaurus. J. Paul Getty Trust. http://www.getty.edu/research/tools/vocabulary/aat/
|
| |
2
|
Aitchison J., Gilchrist A., Bawden D. 2000. Thesaurus construction and use: a practical manual (4th edition). London: ASLIB
|
| |
3
|
Austin D. 1984. PRECIS: a manual of concept analysis and subject indexing. London: British Library
|
| |
4
|
Batty D. 1998. WWW - Wealth, weariness or waste: Controlled vocabulary and thesauri in support of online information access. D-Lib Magazine, November http://www.dlib.org/dlib/november98/11batty.html
|
| |
5
|
Bearman D. 1994. Thesaurally mediated retrieval. Visual Resources, Vol. 10, 295--307
|
| |
6
|
Beaulieu M. 1997. Experiments on interfaces to support query expansion. Journal of Documentation, 53(1), 8--19
|
| |
7
|
Blocks D., Binding C., Cunliffe D., Tudhope D. 2002. Evaluation of information seeking using thesauri in the context of museum collection systems. Technical Report CS-02-1, School of Computing, University of Glamorgan, Pontypridd, CF37 1DL, UK
|
| |
8
|
|
| |
9
|
Chan L., Childress E., Dean R., O'Neill E., Vizine-Goetz D. 2001. A faceted approach to subject data in the Dublin Core metadata record. Journal of Internet Cataloging, 4(1-2), 35--47
|
| |
10
|
|
| |
11
|
Dykstra M. 1989. PRECIS in the online catalog. Cataloguing and Classification Quarterly, 10(1-2), 81--94
|
| |
12
|
FACET Research Project. University of Glamorgan. http://web.glam.ac.uk/schools/soc/research/hypermedia/facet_proj/index.php
|
| |
13
|
Harpring P. 1999. How forcible are the right words: overview of applications and interfaces incorporating the Getty vocabularies. Proc. Museums and the Web 1999. Archives and Museum Informatics. http://www.archimuse.com/mw99/papers/harpring/harpring.html
|
| |
14
|
|
| |
15
|
Hodge G. 2000. Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. The Digital Library Federation Council on Library and Information Resources. http://www.clir.org/pubs/abstract/pub91abst.html
|
| |
16
|
Iyer. H. 1995. Classificatory structures: concepts, relations and representation. Frankfurt: INDEKS Verlag
|
| |
17
|
|
| |
18
|
Koch T. 2000. Quality-controlled subject gateways: definitions, typologies, empirical overview. Online Information Review, 24(1), 24--34
|
| |
19
|
Lee J., Kim H., Lee Y. 1993. Information retrieval based on conceptual distance in ISA hierarchies. Journal of Documentation, 49(2), 113--136
|
| |
20
|
National Museum of Science and Industry (NMSI). http://www.nmsi.ac.uk
|
| |
21
|
Petersen P., Barnett P. (Eds.) 1994. Guide to indexing and cataloging with the Art & Architecture Thesaurus. Oxford: OUP
|
| |
22
|
Petersen T. 1994. The National Art Library and the AAT (Part II). Art and Architecture Thesaurus Bulletin 22, 6--8
|
| |
23
|
Pollitt, A. 1998. The application of Dewey Classification in a view-based searching OPAC. Proc. 5th International ISKO conference, Lille, Ergon Verlag: 176--183
|
| |
24
|
Rada R., Mili H., Bicknell E., Blettner M. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1), 17--30
|
 |
25
|
Bruce R. Schatz , Eric H. Johnson , Pauline A. Cochrane , Hsinchun Chen, Interactive term suggestion for users of digital libraries: using subject thesauri and co-occurrence lists for information retrieval, Proceedings of the first ACM international conference on Digital libraries, p.126-133, March 20-23, 1996, Bethesda, Maryland, United States
[doi> 10.1145/226931.226956]
|
 |
26
|
|
| |
27
|
Soergel. D 1995. The Art and Architecture Thesaurus (AAT): a critical appraisal. Visual Resources, 10(4), 369--400
|
| |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
Tudhope D., Alani H., Jones C. 2001. Augmenting thesaurus relationships: possibilities for retrieval. Journal of Digital Information, 1(8), http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Tudhope/
|
| |
32
|
Tudhope D., Binding C., Blocks D., Cunliffe D. 2002. Representation and retrieval of faceted systems. Proc. 7th International Society of Knowledge Organization Conference (ISKO 2002), Granada, forthcoming
|
 |
33
|
|
CITED BY
|
|
Douglas Tudhope , Ceri Binding , Dorothee Blocks , Daniel Cunliffe, FACET: thesaurus retrieval with semantic term expansion, Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, July 14-18, 2002, Portland, Oregon, USA
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.1
Content Analysis and Indexing
Subjects:
Thesauruses
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
H.3.7
Digital Libraries
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.4
Hypertext/Hypermedia
General Terms:
Algorithms,
Measurement,
Performance
Keywords:
compound descriptors,
faceted classification,
knowledge organization systems,
matching functions,
postcoordination,
precoordination,
semantic distance measures,
similarity coefficients,
term expansion
|