ACM Home Page
Please provide us with feedback. Feedback
Algorithmic detection of semantic similarity
Full text PdfPdf (4.10 MB)
Source International World Wide Web Conference archive
Proceedings of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Semantic querying table of contents
Pages: 107 - 116  
Year of Publication: 2005
ISBN:1-59593-046-9
Authors
Ana G. Maguitman  Indiana University, Bloomington, IN
Filippo Menczer  Indiana University, Bloomington, IN
Heather Roinestad  Indiana University, Bloomington, IN
Alessandro Vespignani  Indiana University, Bloomington, IN
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 241,   Citation Count: 24
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1060745.1060765
What is a DOI?

ABSTRACT

Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic measures is limited by the coverage of user studies, which do not scale with the size, heterogeneity, and growth of the Web. Here we propose to leverage human-generated metadata --- namely topical directories --- to measure semantic relationships among massive numbers of pairs of Web pages or topics. The Open Directory Project classifies millions of URLs in a topical ontology, providing a rich source from which semantic relationships between Web pages can be derived. While semantic similarity measures based on taxonomies (trees) are well studied, the design of well-founded similarity measures for objects stored in the nodes of arbitrary ontologies (graphs) is an open problem. This paper defines an information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology. An experimental study shows that this measure improves significantly on the traditional taxonomy-based approach. This novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good agreement with semantic similarity. Surprisingly, the traditional use of text similarity turns out to be ineffective for relevance ranking.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
5
 
6
7
 
8
 
9
M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.
 
10
 
11
 
12
 
13
14
 
15
F. Menczer. Correlated topologies in citation networks and the web. European Physical Journal B, 38(2):211--221, 2004.
 
16
 
17
S. Polalck. Measures for the comparison of information retrieval systems. American Documentation, 19(4):387--397, 1968.
 
18
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
 
19
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, pages 448--453, 1995.
 
20
 
21
H. Small. Co-Citation in the scientific literature: A new measure of the relationship between documents. Journal of the American Society for Information Science, 42:676--684, 1973.
 
22
A. Tversky. Features of similarity. Psychological Review, 84(4):327--352, 1977.

CITED BY  24

Collaborative Colleagues:
Ana G. Maguitman: colleagues
Filippo Menczer: colleagues
Heather Roinestad: colleagues
Alessandro Vespignani: colleagues