|
ABSTRACT
The potential of automatically generated indexes for information acces s has been recognized for several decades (e.g., Bush 1945 [2], Edmundson and Wyllys 1961 [4]), but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase browsing has begun to emerge (e.g., Anick and Vaithyanathan 1997 [1], Gutwin et al. [10], Nevill-Manning et al. 1997 [17], Godby and Reighart 1998 [9]). In this paper, we consider two issues related to the use of automatically identified phrases as index terms in a dynamic text browser (DTB), a user-centered system for navigating and browsing index terms: 1) What criteria are useful for assessing the usefulness of automatically identified index terms? and 2) Is the quality of the terms identified by automatic indexing such that they provide useful access to document content? The terms that we focus on have been identified by LinkIT, a software tool for identifying significant topics in text [7]. Over 90% of the terms identified by LinkIT are coherent and therefore merit inclusion in the dynamic text browser. Terms identified by LinkIT are input to Intell-Index, a prototype DTB that supports interactive navigation of index terms. The distinction between phrasal heads (the most important words in a coherent term) and modifiers serves as the basis for a hierarchical organization of terms. This linguistically motivated structure helps users to efficiently browsing and disambiguate terms. We conclude that the approach to information access discussed in this paper is very promising, and also that there is much room for further research. In the meantime, this research is a contribution to the establishment of a solid foundation for assessing the usability of terms in phrase browsing applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Bush, Vannevar (1945) "As we may think," Atlantic Monthly. Available from http://www.theatlantic.com/unbound/flashbks/computer/bushf. htm
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
Evans, David K. (1998) LinkIT Documentation, Columbia University Department of Computer Science Report. Available at <http://www.cs.columbia.edu/~devans/papers/LinkITTechDoc/ >
|
| |
7
|
Evans, David K., Klavans, Judith, and Wacholder, Nina (2000) "Document processing with LinkIT", Proc. of the RIAO Conference, Paris, France.
|
 |
8
|
|
| |
9
|
Godby, Carol Jean and Ray Reighart (1998) "Using machinereadable text as a source of novel vocabulary to update the Dewey Decimal Classification", presented at the SIG-CR Workshop, ASIS, < http://orc.rsch.oclc.org:5061/papers/sigcr98.html >.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Jackendoff, Ray, (1977), X-Bar Syntax: A Study of Phrase Structure, MIT Press, Cambridge, MA.
|
| |
14
|
Justeson, John S. and Slava M. Katz (1995). "Technical terminology: some linguistic properties and an algorithm for identification in text", Natural Language Engineering 1(1):9- 27.
|
| |
15
|
|
| |
16
|
|
 |
17
|
Craig G. Nevill-Manning , Ian H. Witten , Gordon W. Paynter, Browsing in digital libraries: a phrase-based approach, Proceedings of the second ACM international conference on Digital libraries, p.230-236, July 23-26, 1997, Philadelphia, Pennsylvania, United States
[doi> 10.1145/263690.263826]
|
| |
18
|
|
| |
19
|
Tomek Strzalkowski , Fang Lin , Jose Perez-Carballo , Jin Wang, Building effective queries in natural language information retrieval, Proceedings of the fifth conference on Applied natural language processing, p.299-306, March 31-April 03, 1997, Washington, DC
[doi> 10.3115/974557.974601]
|
| |
20
|
|
| |
21
|
Wacholder, Nina (1998) "Simplex noun phrases clustered by head: a method for identifying significant topics in a document", Proc. of Workshop on the Computational Treatment of Nominals, edited by Federica Busa, Inderjeet Mani and Patrick Saint-Dizier, pp.70-79. COLING-ACL, October 16, 1998, Montreal.
|
| |
22
|
|
| |
23
|
Wall Street Journal (1988) Available from Penn Treebank, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA.
|
| |
24
|
|
| |
25
|
Zhou, Joe (1999) "Phrasal terms in real-world applications". In Natural Language Information Retrieval, edited by Tomek Strazalowski, Kluwer Academic Publishers, Boston, pp.215-259.
|
CITED BY 9
|
|
|
|
|
|
|
|
Douglas Tudhope , Ceri Binding , Dorothee Blocks , Daniel Cunliffe, Compound descriptors in context: a matching function for classifications and thesauri, Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, July 14-18, 2002, Portland, Oregon, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.1
Content Analysis and Indexing
Subjects:
Indexing methods
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Natural language;
Interaction styles (e.g., commands, menus, forms, direct manipulation)
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.1
Applications and Expert Systems
Subjects:
Natural language interfaces
General Terms:
Design,
Documentation,
Human Factors,
Languages,
Management,
Measurement,
Performance,
Theory
Keywords:
browsing,
genre,
indexing,
natural language processing,
phrases
|