|
ABSTRACT
We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by providing links to pages related to the page the user is currently viewing. We define three kinds of intentional relationships that correspond to whether the user is a) seeking sources of information, b) reading pages which provide information, or c) surfing through pages as part of an extended information-gathering process. We show that these three relationships can be mined using a combination of textual and link information and provide three scoring mechanisms that correspond to them: SeekRel, FactRel and SurfRel. These scoring mechanisms incorporate both textual and link information. We build a set of capacitated subnetworks, each corresponding to a particular keyword. Scores are computed by computing flows on these subnetworks. The capacities of the links are derived from the hub and authority values of the nodes they connect, following the work of Kleinberg (1998) on assigning authority to pages in hyperlinked environments. We evaluated our scoring mechanism by running experiments on four data sets taken from the Web. We present user evaluations of the relevance of the top results returned by our scoring mechanisms and compare those to the top results returned by Google's Similar Pages feature, and the Companion algorithm (Dean and Henzinger, 1999).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Altavista. http://www.altavista.com/.
|
 |
2
|
|
| |
3
|
N. J. Belkin, R. N. Oddy, and H. M. Brooks. ASK for information retrieval: Part I. Background and theory. J. Doc., 38(2):61--71, 1982.
|
| |
4
|
N. J. Belkin, R. N. Oddy, and H. M. Brooks. ASK for information retrieval: Part II. Results of a design study. J. Doc., 38(3):145--164, 1982.
|
| |
5
|
|
 |
6
|
|
| |
7
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
8
|
|
| |
9
|
M. de Kunder. The size of the World Wide Web. http://www.worldwidewebsize.com/. Retrieved on 29th February 2008.
|
| |
10
|
|
| |
11
|
C. Fellbaum, editor. Wordnet: An electronic lexical database. Bradford Books, 1998.
|
| |
12
|
A. J. Ferrari, D. Gourley, K. Johnson, F. C. Knabe, D. Tunkelang, and J. S. Walter. Hierarchical data-driven navigation system and method for information retrieval. U.S. Patent number 7,035,864, April 2006.
|
 |
13
|
Taher H. Haveliwala , Aristides Gionis , Dan Klein , Piotr Indyk, Evaluating strategies for similarity search on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511502]
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
S. Lawrence and C. L. Giles. Accessibility of information on the Web. Nature, 400:107--109, 1999.
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
Wangzhong Lu , Jeannette Janssen , Evangelos Milios , Nathalie Japkowicz, Node similarity in networked information spaces, Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research, p.11, November 05-07, 2001, Toronto, Ontario, Canada
|
| |
24
|
|
| |
25
|
Nutch. http://lucene.apache.org/nutch/.
|
 |
26
|
James Pitkow , Peter Pirolli, Life, death, and lawfulness on the electronic frontier, Proceedings of the SIGCHI conference on Human factors in computing systems, p.383-390, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258805]
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
A. Tombros and Z. Ali. Factors affecting Web page similarity. In 27th European Conference on Information Retrieval (ECIR), 2005.
|
 |
31
|
Wensi Xi , Edward A. Fox , Weiguo Fan , Benyu Zhang , Zheng Chen , Jun Yan , Dong Zhuang, SimFusion: measuring similarity using unified relationship matrix, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076059]
|
| |
32
|
Yahoo! Content analysis Web services: Term extraction. \tiny http://developer.yahoo.com/search/content/V1/termExtraction.html.\endthebibliography
|
|