ACM Home Page
Please provide us with feedback. Feedback
Wikipedia pages as entry points for book search
Full text PdfPdf (937 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Web search table of contents
Pages 44-53  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Marijn Koolen  University of Amsterdam, The Netherlands
Gabriella Kazai  Microsoft Research, Cambridge, UK
Nick Craswell  Microsoft Research, Cambridge, UK
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 261,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498807
What is a DOI?

ABSTRACT

A lot of the world's knowledge is stored in books, which, as a result of recent mass-digitisation efforts, are increasingly available online. Search engines, such as Google Books, provide mechanisms for searchers to enter this vast knowledge space using queries as entry points. In this paper, we view Wikipedia as a summary of this world knowledge and aim to use this resource to guide users to relevant books. Thus, we investigate possible ways of using Wikipedia as an intermediary between the user's query and a collection of books being searched. We experiment with traditional query expansion techniques, exploiting Wikipedia articles as rich sources of information that can augment the user's query. We then propose a novel approach based on link distance in an extended Wikipedia graph: we associate books with Wikipedia pages that cite these books and use the link distance between these nodes and the pages that match the user query as an estimation of a book's relevance to the query. Our results show that a) classical query expansion using terms extracted from query pages leads to increased precision, and b) link distance between query and book pages in Wikipedia provides a good indicator of relevance that can boost the retrieval score of relevant books in the result ranking of a book search engine.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. Abdullah and F. Gibb. Using a Task-Based Approach in Evaluating the Usability of BoBIs in an e-Book Environment. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume Lecture Notes in omputer Science, Vol. 4956, pages 246--257. Springer-Verlag, 2008.
 
2
J. Arguello, J. L. Elsas, J. Callan, and J. G. Carbonell. Document representation and query expansion models for blog recommendation. In Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008) 2008, 2008.
 
3
F. Bellomi and R. Bonato. Network Analysis for Wikipedia. Proceedings of Wikimania, 2005.
 
4
 
5
A. Capocci, V. D. P. Servedio, F. Colaiori, L. S. Buriol, D. Donato, S. Leonardi, and G. Caldarelli. Preferential attachment in the growth of social networks: the case of Wikipedia. Physical Review E, Feb 2006.
6
7
8
9
 
10
A. Halavais and D. Lackaff. An Analysis of Topical Coverage of Wikipedia. Journal of Computer-Mediated Communication, 13(2):429--440, 2008.
11
 
12
D. Hawking. Overview of the TREC-9 Web Track. In TREC, 2000.
13
 
14
J. Kamps and M. Koolen. The Importance of Link Evidence in Wikipedia. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume 4956 of Lecture Notes in Computer Science, pages 270--282. Springer Verlag, Heidelberg, 2008.
15
16
17
 
18
W. Kraaij and T. Westerveld. How Different are Web Documents? In Proceedings of the ninth Text Retrieval Conference, TREC-9. NIST Special Publication, May 2001.
19
20
21
 
22
F. Å. Nielsen. Scientific citations in Wikipedia. First Monday, 12(8), 2007.
 
23
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
24
 
25
C. Tre. Common evaluation measures. The Twelfth Text REtrieval Conference (TREC 2003), 2003.
 
26
 
27
J. Voss. Measuring Wikipedia. In Proceedings International Conference of the International Society for Scientometrics and Informetrics, Stockholm, Sweden, 2005.
 
28
H. Wu, G. Kazai, and M. Taylor. Book search experiments: Investigating ir methods for the indexing and retrieval of books. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume 4956 of Lecture Notes in Computer Science, pages 234--245. Springer Verlag, Heidelberg, 2008.
 
29
V. Zlatic, M. Bozicevic, H. Stefancic, and M. Domazet. Wikipedias: Collaborative web-based encyclopedias as complex networks. Physical Review E, Jul 2006.

Collaborative Colleagues:
Marijn Koolen: colleagues
Gabriella Kazai: colleagues
Nick Craswell: colleagues