ACM Home Page
Please provide us with feedback. Feedback
SimFusion: measuring similarity using unified relationship matrix
Full text PdfPdf (335 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
SESSION: Categorization and classification table of contents
Pages: 130 - 137  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Wensi Xi  Virginia Tech, Blacksburg, VA
Edward A. Fox  Virginia Tech, Blacksburg, VA
Weiguo Fan  Virginia Tech, Blacksburg, VA
Benyu Zhang  Microsoft Research Asia, Beijing, China
Zheng Chen  Microsoft Research Asia, Beijing, China
Jun Yan  Beijing University, Beijing, China
Dong Zhuang  Beijing Institute of Technology, Beijing, China
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 153,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076059
What is a DOI?

ABSTRACT

In this paper we use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects (e.g., web pages, queries) and their interrelationships (e.g., hyperlinks, user click-through sequences). We claim that iterative computations over the URM can help overcome the data sparseness problem and detect latent relationships among heterogeneous data objects, thus, can improve the quality of information applications that require com- bination of information from heterogeneous sources. To support our claim, we present a unified similarity-calculating algorithm, SimFusion. By iteratively computing over the URM, SimFusion can effectively integrate relationships from heterogeneous sources when measuring the similarity of two data objects. Experiments based on a web search engine query log and a web page collection demonstrate that SimFusion can improve similarity measurement of web objects over both traditional content based algorithms and the cutting edge SimRank algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
T. L. Brauen, "Document Vector Modification", in The Smart Retrieval System-Experiments in Automatic Document Processing, G. Salton, editor, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, Chapter 24, 1971.
 
3
 
4
V. Bush, "As We May Think", The Atlantic Monthly, vol. 176, pp.101--108, July 1945.
 
5
 
6
 
7
G. Das, H. Mannila, P. Ronkainen, "Similarity of attributes by external probes", in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 23--29, 1998.
8
 
9
J. Dean and S. Ghemawat, " MapReduce: Simplified Data Processing on Large Clusters", in Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI'04), San Francisco, CA, pp. 137--150, Dec. 2004.
 
10
11
12
 
13
14
15
 
16
O. Kallenberg, Foundations of Modern Probability. New York: Springer-Verlag, 1997.
 
17
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.
18
 
19
R.R. Larson. "Bibliometrics of the World-Wide Web: An exploratory analysis of the intellectual structure of cyberspace", in Proceedings of the Annual Meeting of the American Society for Information Science. Baltimore, Maryland, October, 1996.
 
20
N. Liu et al. "A similarity reinforcement algorithm for heterogeneous Web Pages", in Proceedings of the seventh Asia Pacific Web Conference, Shanghai, March, 2005.
 
21
22
23
24
 
25
J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs, NJ, 1971.
 
26
 
27
H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents, J. of the American Society of Information Science 24:265--269, 1973.
28
29
30
31
 
32
W. Xi, B. Zhang and E. A. Fox "SimFusion, A Unified Similarity Measurement Algorithm for Multi-type Interrelated Web Objects", Technical Report, TR-04-19, Computer Science Department, Virginia Tech, Dec. 2004.
33

CITED BY  9

Collaborative Colleagues:
Wensi Xi: colleagues
Edward A. Fox: colleagues
Weiguo Fan: colleagues
Benyu Zhang: colleagues
Zheng Chen: colleagues
Jun Yan: colleagues
Dong Zhuang: colleagues