|
ABSTRACT
Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic measures is limited by the coverage of user studies, which do not scale with the size, heterogeneity, and growth of the Web. Here we propose to leverage human-generated metadata --- namely topical directories --- to measure semantic relationships among massive numbers of pairs of Web pages or topics. The Open Directory Project classifies millions of URLs in a topical ontology, providing a rich source from which semantic relationships between Web pages can be derived. While semantic similarity measures based on taxonomies (trees) are well studied, the design of well-founded similarity measures for objects stored in the nodes of arbitrary ontologies (graphs) is an open problem. This paper defines an information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology. An experimental study shows that this measure improves significantly on the traditional taxonomy-based approach. This novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good agreement with semantic similarity. Surprisingly, the traditional use of text similarity turns out to be ineffective for relevance ranking.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
Taher H. Haveliwala , Aristides Gionis , Dan Klein , Piotr Indyk, Evaluating strategies for similarity search on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511502]
|
| |
8
|
|
| |
9
|
M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Wangzhong Lu , Jeannette Janssen , Evangelos Milios , Nathalie Japkowicz, Node similarity in networked information spaces, Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research, p.11, November 05-07, 2001, Toronto, Ontario, Canada
|
 |
14
|
|
| |
15
|
F. Menczer. Correlated topologies in citation networks and the web. European Physical Journal B, 38(2):211--221, 2004.
|
| |
16
|
|
| |
17
|
S. Polalck. Measures for the comparison of information retrieval systems. American Documentation, 19(4):387--397, 1968.
|
| |
18
|
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
|
| |
19
|
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, pages 448--453, 1995.
|
| |
20
|
|
| |
21
|
H. Small. Co-Citation in the scientific literature: A new measure of the relationship between documents. Journal of the American Society for Information Science, 42:676--684, 1973.
|
| |
22
|
A. Tversky. Features of similarity. Psychological Review, 84(4):327--352, 1977.
|
CITED BY 24
|
|
|
|
|
Ding Zhou , Eren Manavoglu , Jia Li , C. Lee Giles , Hongyuan Zha, Probabilistic models for discovering e-communities, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
Lubomira Stoilova , Todd Holloway , Ben Markines , Ana G. Maguitman , Filippo Menczer, GiveALink: mining a semantic network of bookmarks for web search and recommendation, Proceedings of the 3rd international workshop on Link discovery, p.66-73, August 21-25, 2005, Chicago, Illinois
|
|
|
R. Akavipat , L.-S. Wu , F. Menczer , A.G. Maguitman, Emerging semantic communities in peer web search, Proceedings of the international workshop on Information retrieval in peer-to-peer networks, November 11-11, 2006, Arlington, Virginia, USA
|
|
|
R. Akavipat , L.-S. Wu , F. Menczer , A.G. Maguitman, Emerging semantic communities in peer web search, Proceedings of the international workshop on Information retrieval in peer-to-peer networks, November 11-11, 2006, Arlington, Virginia, USA
|
|
|
R. Akavipat , L.-S. Wu , F. Menczer , A.G. Maguitman, Emerging semantic communities in peer web search, Proceedings of the international workshop on Information retrieval in peer-to-peer networks, November 11-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Arianna D'Ulizia , Fernando Ferri , Anna Formica , Patrizia Grifoni , Maurizio Rafanelli, Structural similarity in geographical queries to improve query answering, Proceedings of the 2007 ACM symposium on Applied computing, March 11-15, 2007, Seoul, Korea
|
|
|
|
|
|
|
|
|
Fekade Getahun , Joe Tekli , Solomon Atnafu , Richard Chbeir, The use of semantic-based predicates implication to improve horizontal multimedia database fragmentation, Workshop on multimedia information retrieval on The many faces of multimedia semantics, September 28-28, 2007, Augsburg, Bavaria, Germany
|
|
|
|
|
|
|
|
|
Benjamin Markines , Ciro Cattuto , Filippo Menczer , Dominik Benz , Andreas Hotho , Gerd Stumme, Evaluating similarity measures for emergent semantics of social tagging, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|