ACM Home Page
Please provide us with feedback. Feedback
Automatic identification and organization of index terms for interactive browsing
Full text PdfPdf (297 KB)
Source International Conference on Digital Libraries archive
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries table of contents
Roanoke, Virginia, United States
Pages: 126 - 134  
Year of Publication: 2001
ISBN:1-58113-345-6
Authors
Nina Wacholder  Columbia University, New York, NY
Dvid K. Evans  Columbia University, New York, NY
Judith L. Klavans  Columbia University, New York, NY
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 37,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/379437.379468
What is a DOI?

ABSTRACT

The potential of automatically generated indexes for information acces s has been recognized for several decades (e.g., Bush 1945 [2], Edmundson and Wyllys 1961 [4]), but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase browsing has begun to emerge (e.g., Anick and Vaithyanathan 1997 [1], Gutwin et al. [10], Nevill-Manning et al. 1997 [17], Godby and Reighart 1998 [9]). In this paper, we consider two issues related to the use of automatically identified phrases as index terms in a dynamic text browser (DTB), a user-centered system for navigating and browsing index terms: 1) What criteria are useful for assessing the usefulness of automatically identified index terms? and 2) Is the quality of the terms identified by automatic indexing such that they provide useful access to document content? The terms that we focus on have been identified by LinkIT, a software tool for identifying significant topics in text [7]. Over 90% of the terms identified by LinkIT are coherent and therefore merit inclusion in the dynamic text browser. Terms identified by LinkIT are input to Intell-Index, a prototype DTB that supports interactive navigation of index terms. The distinction between phrasal heads (the most important words in a coherent term) and modifiers serves as the basis for a hierarchical organization of terms. This linguistically motivated structure helps users to efficiently browsing and disambiguate terms. We conclude that the approach to information access discussed in this paper is very promising, and also that there is much room for further research. In the meantime, this research is a contribution to the establishment of a solid foundation for assessing the usability of terms in phrase browsing applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Bush, Vannevar (1945) "As we may think," Atlantic Monthly. Available from http://www.theatlantic.com/unbound/flashbks/computer/bushf. htm
3
4
 
5
 
6
Evans, David K. (1998) LinkIT Documentation, Columbia University Department of Computer Science Report. Available at <http://www.cs.columbia.edu/~devans/papers/LinkITTechDoc/ >
 
7
Evans, David K., Klavans, Judith, and Wacholder, Nina (2000) "Document processing with LinkIT", Proc. of the RIAO Conference, Paris, France.
8
 
9
Godby, Carol Jean and Ray Reighart (1998) "Using machinereadable text as a source of novel vocabulary to update the Dewey Decimal Classification", presented at the SIG-CR Workshop, ASIS, < http://orc.rsch.oclc.org:5061/papers/sigcr98.html >.
 
10
 
11
 
12
 
13
Jackendoff, Ray, (1977), X-Bar Syntax: A Study of Phrase Structure, MIT Press, Cambridge, MA.
 
14
Justeson, John S. and Slava M. Katz (1995). "Technical terminology: some linguistic properties and an algorithm for identification in text", Natural Language Engineering 1(1):9- 27.
 
15
 
16
17
 
18
 
19
 
20
 
21
Wacholder, Nina (1998) "Simplex noun phrases clustered by head: a method for identifying significant topics in a document", Proc. of Workshop on the Computational Treatment of Nominals, edited by Federica Busa, Inderjeet Mani and Patrick Saint-Dizier, pp.70-79. COLING-ACL, October 16, 1998, Montreal.
 
22
 
23
Wall Street Journal (1988) Available from Penn Treebank, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA.
 
24
 
25
Zhou, Joe (1999) "Phrasal terms in real-world applications". In Natural Language Information Retrieval, edited by Tomek Strazalowski, Kluwer Academic Publishers, Boston, pp.215-259.

CITED BY  9

Collaborative Colleagues:
Nina Wacholder: colleagues
Dvid K. Evans: colleagues
Judith L. Klavans: colleagues