| Inferring hierarchical descriptions |
| Full text |
Pdf
(239 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eleventh international conference on Information and knowledge management
table of contents
McLean, Virginia, USA
SESSION: Web clustering
table of contents
Pages: 507 - 514
Year of Publication: 2002
ISBN:1-58113-492-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 55, Citation Count: 10
|
|
|
ABSTRACT
We create a statistical model for inferring hierarchical term relationships about a topic, given only a small set of example web pages on the topic, without prior knowledge of any hierarchical information. The model can utilize either the full text of the pages in the cluster or the context of links to the pages. To support the model, we use "ground truth" data taken from the category labels in the Open Directory. We show that the model accurately separates terms in the following classes: self terms describing the cluster, parent terms describing more general concepts, and child terms describing specializations of the cluster. For example, for a set of biology pages, sample parent, self, and child terms are science, biology, and genetics respectively. We create an algorithm to predict parent, self, and child terms using the new model, and compare the predictions to the ground truth data. The algorithm accurately ranks a majority of the ground truth terms highly, and identifies additional complementary terms missing in the Open Directory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Fasulo. An analysis of recent work on clustering algorithms. Technical report, University of Washington, 1999. Available at: http://citeseer.nj.nec.com/fasulo99analysi.html.
|
 |
3
|
Eric J. Glover , Kostas Tsioutsiouliklis , Steve Lawrence , David M. Pennock , Gary W. Flake, Using web structure for classifying and describing web pages, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511520]
|
| |
4
|
|
| |
5
|
Marti~A. Hearst. Automated discovery of WordNet relations. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database. MIT Press, 1998.
|
| |
6
|
|
| |
7
|
|
| |
8
|
Alexandrin Popescul and Lyle~H. Ungar. Automatic labeling of document clusters. Unpublished manuscript, available at: http://citeseer.nj.nec.com/popescul00automatic.html.
|
| |
9
|
|
 |
10
|
|
|