ACM Home Page
Please provide us with feedback. Feedback
Inferring hierarchical descriptions
Full text PdfPdf (239 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eleventh international conference on Information and knowledge management table of contents
McLean, Virginia, USA
SESSION: Web clustering table of contents
Pages: 507 - 514  
Year of Publication: 2002
ISBN:1-58113-492-4
Authors
Eric Glover  NEC Research Institute, Princeton, NJ
David M. Pennock  NEC Research Institute, Princeton, NJ
Steve Lawrence  NEC Research Institute, Princeton, NJ
Robert Krovetz  NEC Research Institute, Princeton, NJ
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 55,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/584792.584876
What is a DOI?

ABSTRACT

We create a statistical model for inferring hierarchical term relationships about a topic, given only a small set of example web pages on the topic, without prior knowledge of any hierarchical information. The model can utilize either the full text of the pages in the cluster or the context of links to the pages. To support the model, we use "ground truth" data taken from the category labels in the Open Directory. We show that the model accurately separates terms in the following classes: self terms describing the cluster, parent terms describing more general concepts, and child terms describing specializations of the cluster. For example, for a set of biology pages, sample parent, self, and child terms are science, biology, and genetics respectively. We create an algorithm to predict parent, self, and child terms using the new model, and compare the predictions to the ground truth data. The algorithm accurately ranks a majority of the ground truth terms highly, and identifies additional complementary terms missing in the Open Directory.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
D. Fasulo. An analysis of recent work on clustering algorithms. Technical report, University of Washington, 1999. Available at: http://citeseer.nj.nec.com/fasulo99analysi.html.
3
 
4
 
5
Marti~A. Hearst. Automated discovery of WordNet relations. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database. MIT Press, 1998.
 
6
 
7
 
8
Alexandrin Popescul and Lyle~H. Ungar. Automatic labeling of document clusters. Unpublished manuscript, available at: http://citeseer.nj.nec.com/popescul00automatic.html.
 
9
10

CITED BY  10

Collaborative Colleagues:
Eric Glover: colleagues
David M. Pennock: colleagues
Steve Lawrence: colleagues
Robert Krovetz: colleagues