ACM Home Page
Please provide us with feedback. Feedback
Is Wikipedia link structure different?
Full text PdfPdf (951 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Web ranking table of contents
Pages 232-241  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Jaap Kamps  University of Amsterdam
Marijn Koolen  University of Amsterdam
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 203,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498831
What is a DOI?

ABSTRACT

In this paper, we investigate the difference between Wikipedia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR test-collections: the .GOV collection used at the TREC Web tracks and the Wikipedia XML Corpus used at INEX. We first perform a comparative analysis of Wikipedia and .GOV link structure and then investigate the value of link evidence for improving search on Wikipedia and on the .GOV domain. Our main findings are: First, Wikipedia link structure is similar to the Web, but more densely linked. Second, Wikipedia's outlinks behave similar to inlinks and both are good indicators of relevance, whereas on the Web the inlinks are more important. Third, when incorporating link evidence in the retrieval model, for Wikipedia the global link evidence fails and we have to take the local context into account.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.
 
3
F. Bellomi and R. Bonato. Network analysis for wikipedia. In Proceedings of Wikimania, 2005.
 
4
 
5
6
7
8
9
 
10
D. Hawking. Overview of the trec-9 web track. In TREC, 2000.
 
11
D. Hawking and N. Craswell. Very large scale retrieval and web search. In E. Voorhees and D. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, chapter 9. MIT Press, 2005.
 
12
D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente, 2001.
13
 
14
J. Kamps and M. Koolen. The importance of link evidence in Wikipedia. In Advances in Information Retrieval: 30th European Conference on IR Research (ECIR 2008), volume 4956 of Lecture Notes in Computer Science, pages 270--282. Springer Verlag, Heidelberg, 2008.
 
15
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18:39--43, 1953.
16
 
17
W. Kraaij and T. Westerveld. How different are web documents? In TREC-9. NIST Special Publication, May 2001.
18
 
19
 
20
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107--109, 1999.
21
22
23
24
 
25
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
 
26
J. R. Seeley. The net of reciprocal influence. Canadian Journal of Psychology, 3:234--240, 1949.
27
 
28
J. Voss. Measuring wikipedia. In ISSI 2005, 2005.
 
29
S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications, volume 8 of Structural Analysis in the Social Sciences. Cambridge University Press, Cambridge MA, 1994.
 
30
T. Westerveld, D. Hiemstra, and W. Kraaij. Retrieving web pages using content, links, URL's and anchors. In The Tenth Text Retrieval Conference, TREC-2001, pages 52--61, May 2002.

Collaborative Colleagues:
Jaap Kamps: colleagues
Marijn Koolen: colleagues