|
ABSTRACT
The notion of similarity between objects finds use in many contexts, for example, in search engines, collaborative filtering, and clustering. Objects being compared often are modeled as sets, with their similarity traditionally determined based on set intersection. Intersection-based measures do not accurately capture similarity in certain domains, such as when the data is sparse or when there are known relationships between items within sets. We propose new measures that exploit a hierarchical domain structure in order to produce more intuitive similarity scores. We extend our similarity measures to provide appropriate results in the presence of multisets (also handled unsatisfactorily by traditional measures), for example, to correctly compute the similarity between customers who buy several instances of the same product (say milk), or who buy several products in the same category (say dairy products). We also provide an experimental comparison of our measures against traditional similarity measures, and report on a user study that evaluated how well our measures match human intuition.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Kurt D. Bollacker , Steve Lawrence , C. Lee Giles, CiteSeer: an autonous Web agent for automatic retrieval and identification of interesting publications, Proceedings of the second international conference on Autonomous agents, p.116-123, May 10-13, 1998, Minneapolis, Minnesota, United States
[doi> 10.1145/280765.280786]
|
| |
2
|
Breese, J. S., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence.
|
| |
3
|
|
| |
4
|
Das, G., Mannila, H., and Ronkainen, P. 1998. Similarity of attributes by external probes. In Proceedings of Knowledge Discovery and Data Mining (KDD), 23--29.
|
| |
5
|
de Buenaga Rodríguez, M., Gómez-Hidalgo, J. M., and Díaz-Agudo, B. 1997. Using WordNet to complement training information in text categorization. In Proceedings of the Second International Conference on Recent Advances in Natural Language Processing.
|
| |
6
|
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic indexing. J. Amer. Soc. Inf. Sci. 41, 6, 391--407.
|
| |
7
|
Feldman, R. and Dagan, I. 1995. Knowledge discovery in textual databases. In Proceedings of KDD-95.
|
| |
8
|
Ganesan, P., Garcia-Molina, H., and Widom, J. 2002. Exploiting hierarchical domain structure to compute similarity. Tech. Rep., Available at http://dbpubs.stanford.edu/pub/2001-27.
|
 |
9
|
|
| |
10
|
Nathaniel Good , J. Ben Schafer , Joseph A. Konstan , Al Borchers , Badrul Sarwar , Jon Herlocker , John Riedl, Combining collaborative filtering with personal agents for better recommendations, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.439-446, July 18-22, 1999, Orlando, Florida, United States
|
| |
11
|
|
| |
12
|
|
| |
13
|
Jeh, G. and Widom, J. 2001. Simrank: A measure of structural-context similarity. Tech. Rep. Stanford University. Available at http://dbpubs.stanford.edu/pub/2001-41.
|
| |
14
|
Joshi, A. and Krishnapuram, R. 2000. On mining Web access logs. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 63--69.
|
| |
15
|
|
| |
16
|
Lee, J. and Kim, M. 1993. Information retrieval based on a conceptual distance in is-a heirarchy. J. Doc. 49, 2, 188--207.
|
| |
17
|
Lustig, G. 1967. A new class of association factors in mechanized information storage, retrieval and dissemination. In F. I. D./I. F. I. P. Joint Conference (Rome).
|
| |
18
|
McGill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.
|
| |
19
|
Melnik, S., Garcia-Molina, H., and Rahm, E. 2002. Similarity flooding: A versatile graph matching algorithm. In Proceedings of ICDE 2002.
|
| |
20
|
Miller, G. R. B. Fellbaum, C., Gross, D., and Miller, K. 1990. Introduction to WordNet: An on-line lexical database. J. Lexicog. 3, 4, 234--244.
|
| |
21
|
Nasraoui, O., Frigui, H., Joshi, A., and Krishnapuram, R. 1999. Mining Web access logs using relational competitive fuzzy clustering. In Proceedings of the Eighth International Fuzzy Systems Association World Congress---IFSA 99 (Taipei).
|
| |
22
|
OPD. The Open Directory Project. Available at http://dmoz.org/.
|
| |
23
|
Rada, R., Mili, H., Bicknell, E., and Blettner, M. 1989. Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19, 1, 17--30.
|
| |
24
|
Resnick, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 448--453.
|
 |
25
|
Paul Resnick , Neophytos Iacovou , Mitesh Suchak , Peter Bergstrom , John Riedl, GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the 1994 ACM conference on Computer supported cooperative work, p.175-186, October 22-26, 1994, Chapel Hill, North Carolina, United States
[doi> 10.1145/192844.192905]
|
| |
26
|
Richardson, R. and Smeaton, A. F. 1995. Using WordNet in a knowledge-based approach to information retrieval. In Proceedings of the Seventeenth BCS-IRSG Colloquium on Information Retrieval.
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
Sankoff, D. and Kruskal, J. B. 1983. Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.
|
| |
31
|
Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2000. Application of dimensionality reduction in recommender system---a case study. In Proceedings of the ACM WebKDD 2000 Workshop.
|
 |
32
|
Badrul Sarwar , George Karypis , Joseph Konstan , John Reidl, Item-based collaborative filtering recommendation algorithms, Proceedings of the 10th international conference on World Wide Web, p.285-295, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372071]
|
 |
33
|
Badrul M. Sarwar , Joseph A. Konstan , Al Borchers , Jon Herlocker , Brad Miller , John Riedl, Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system, Proceedings of the 1998 ACM conference on Computer supported cooperative work, p.345-354, November 14-18, 1998, Seattle, Washington, United States
[doi> 10.1145/289444.289509]
|
| |
34
|
Scott, S. and Matwin, S. 1998. Text classification using WordNet hypernyms. In Proceedings of the Use of WordNet in Natural Language Processing Systems. Association for Computational Linguistics.
|
| |
35
|
Shasha, D. and Zhang, K. 1997. Approximate tree pattern matching. In Pattern Matching Algorithms, Oxford University Press, New York.
|
| |
36
|
Shivakumar, N. and Garcia-Molina, H. 1995. Scam: A copy detection mechanism for digital documents. In Proceedings of the Second International Conference in Theory and Practice of Digital Libraries.
|
| |
37
|
Sibson, R. 1972. Order invariant methods for data analysis. J. Roy. Stat. Soc. 34, 3, 311--349.
|
| |
38
|
Sneath, P. and Sokal, R. 1973. Numerical Taxonomy. W. H. Freeman, San Francisco.
|
| |
39
|
Soergel, D. 1967. Mathematical analysis of documentation systems. An attempt at a theory of classification and search request formulation. Inf. Stor. Retrieval 3, 3, 129--173.
|
| |
40
|
|
| |
41
|
Strehl, A., Ghosh, J., and Mooney, R. 2000. Impact of similarity measures on Web-page clustering. In Proceedings of the AAAI Workshop on AI for Web Search.
|
| |
42
|
|
CITED BY 38
|
|
|
|
|
|
|
|
|
|
|
Lubomira Stoilova , Todd Holloway , Ben Markines , Ana G. Maguitman , Filippo Menczer, GiveALink: mining a semantic network of bookmarks for web search and recommendation, Proceedings of the 3rd international workshop on Link discovery, p.66-73, August 21-25, 2005, Chicago, Illinois
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ilaria Bartolini , Paolo Ciaccia , Irene Ntoutsi , Marco Patella , Yannis Theodoridis, The Panda framework for comparing patterns, Data & Knowledge Engineering, v.68 n.2, p.244-260, February, 2009
|
|
|
Yolanda Blanco-Fernández , José J. Pazos-Arias , Alberto Gil-Solla , Manuel Ramos-Cabrer , Martín López-Nores , Jorge García-Duque , Ana Fernández-Vilas , Rebeca P. Díaz-Redondo , Jesús Bermejo-Muñoz, A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems, Knowledge-Based Systems, v.21 n.4, p.305-320, May, 2008
|
|
|
|
|
|
|
|
|
|
|
|
Wilma Penzo , Stefano Lodi , Federica Mandreoli , Riccardo Martoglia , Simona Sassatelli, Semantic peer, here are the neighbors you want!, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, March 25-29, 2008, Nantes, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yolanda Blanco-Fernández , José J. Pazos-Arias , Alberto Gil-Solla , Manuel Ramos-Cabrer , Martín López-Nores , Jorge García-Duque , Ana Fernández-Vilas , Rebeca P. Díaz-Redondo, Exploiting synergies between semantic reasoning and personalization strategies in intelligent recommender systems: A case study, Journal of Systems and Software, v.81 n.12, p.2371-2385, December, 2008
|
|
|
Yolanda Blanco-Fernández , José J. Pazos-Arias , Alberto Gil-Solla , Manuel Ramos-Cabrer , Martín López-Nores , Jorge García-Duque , Ana Fernández-Vilas , Rebeca P. Díaz-Redondo , Jesús Bermejo-Muñoz, An MHP framework to provide intelligent personalized recommendations about digital TV contents, Software—Practice & Experience, v.38 n.9, p.925-960, July 2008
|
|
|
Martín López-Nores , Yolanda Blanco-Fernández , José J. Pazos-Arias , Jorge García-Duque , Manuel Ramos-Cabrer , Alberto Gil-Solla , Rebeca P. Díaz-Redondo , Ana Fernández-Vilas, Receiver-side semantic reasoning for digital TV personalization in the absence of return channels, Multimedia Tools and Applications, v.41 n.3, p.407-436, February 2009
|
|
|
Benjamin Markines , Ciro Cattuto , Filippo Menczer , Dominik Benz , Andreas Hotho , Gerd Stumme, Evaluating similarity measures for emergent semantics of social tagging, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
Shlomo Berkovsky , Dan Goldwasser , Tsvi Kuflik , Francesco Ricci, Identifying Inter-Domain Similarities through Content-Based Analysis of Hierarchical Web-Directories, Proceeding of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy, p.789-790, May 22, 2006
|
|