|
ABSTRACT
In this paper, we investigate the difference between Wikipedia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR test-collections: the .GOV collection used at the TREC Web tracks and the Wikipedia XML Corpus used at INEX. We first perform a comparative analysis of Wikipedia and .GOV link structure and then investigate the value of link evidence for improving search on Wikipedia and on the .GOV domain. Our main findings are: First, Wikipedia link structure is similar to the Web, but more densely linked. Second, Wikipedia's outlinks behave similar to inlinks and both are good indicators of relevance, whereas on the Web the inlinks are more important. Third, when incorporating link evidence in the retrieval model, for Wikipedia the global link evidence fails and we have to take the local context into account.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.
|
| |
3
|
F. Bellomi and R. Bonato. Network analysis for wikipedia. In Proceedings of Wikimania, 2005.
|
| |
4
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.309-320, June 2000, Amsterdam, The Netherlands
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
Michalis Faloutsos , Petros Faloutsos , Christos Faloutsos, On power-law relationships of the Internet topology, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.251-262, August 30-September 03, 1999, Cambridge, Massachusetts, United States
|
| |
10
|
D. Hawking. Overview of the trec-9 web track. In TREC, 2000.
|
| |
11
|
D. Hawking and N. Craswell. Very large scale retrieval and web search. In E. Voorhees and D. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, chapter 9. MIT Press, 2005.
|
| |
12
|
D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente, 2001.
|
 |
13
|
|
| |
14
|
J. Kamps and M. Koolen. The importance of link evidence in Wikipedia. In Advances in Information Retrieval: 30th European Conference on IR Research (ECIR 2008), volume 4956 of Lecture Notes in Computer Science, pages 270--282. Springer Verlag, Heidelberg, 2008.
|
| |
15
|
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18:39--43, 1953.
|
 |
16
|
|
| |
17
|
W. Kraaij and T. Westerveld. How different are web documents? In TREC-9. NIST Special Publication, May 2001.
|
 |
18
|
|
| |
19
|
|
| |
20
|
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107--109, 1999.
|
 |
21
|
Jure Leskovec , Jon Kleinberg , Christos Faloutsos, Graphs over time: densification laws, shrinking diameters and possible explanations, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081893]
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
|
| |
26
|
J. R. Seeley. The net of reciprocal influence. Canadian Journal of Psychology, 3:234--240, 1949.
|
 |
27
|
|
| |
28
|
J. Voss. Measuring wikipedia. In ISSI 2005, 2005.
|
| |
29
|
S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications, volume 8 of Structural Analysis in the Social Sciences. Cambridge University Press, Cambridge MA, 1994.
|
| |
30
|
T. Westerveld, D. Hiemstra, and W. Kraaij. Retrieving web pages using content, links, URL's and anchors. In The Tenth Text Retrieval Conference, TREC-2001, pages 52--61, May 2002.
|
|