|
ABSTRACT
A lot of the world's knowledge is stored in books, which, as a result of recent mass-digitisation efforts, are increasingly available online. Search engines, such as Google Books, provide mechanisms for searchers to enter this vast knowledge space using queries as entry points. In this paper, we view Wikipedia as a summary of this world knowledge and aim to use this resource to guide users to relevant books. Thus, we investigate possible ways of using Wikipedia as an intermediary between the user's query and a collection of books being searched. We experiment with traditional query expansion techniques, exploiting Wikipedia articles as rich sources of information that can augment the user's query. We then propose a novel approach based on link distance in an extended Wikipedia graph: we associate books with Wikipedia pages that cite these books and use the link distance between these nodes and the pages that match the user query as an estimation of a book's relevance to the query. Our results show that a) classical query expansion using terms extracted from query pages leads to increased precision, and b) link distance between query and book pages in Wikipedia provides a good indicator of relevance that can boost the retrieval score of relevant books in the result ranking of a book search engine.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
N. Abdullah and F. Gibb. Using a Task-Based Approach in Evaluating the Usability of BoBIs in an e-Book Environment. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume Lecture Notes in omputer Science, Vol. 4956, pages 246--257. Springer-Verlag, 2008.
|
| |
2
|
J. Arguello, J. L. Elsas, J. Callan, and J. G. Carbonell. Document representation and query expansion models for blog recommendation. In Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008) 2008, 2008.
|
| |
3
|
F. Bellomi and R. Bonato. Network Analysis for Wikipedia. Proceedings of Wikimania, 2005.
|
| |
4
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.33 n.1-6, p.309-320, June 2000
|
| |
5
|
A. Capocci, V. D. P. Servedio, F. Colaiori, L. S. Buriol, D. Donato, S. Leonardi, and G. Caldarelli. Preferential attachment in the growth of social networks: the case of Wikipedia. Physical Review E, Feb 2006.
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
Michalis Faloutsos , Petros Faloutsos , Christos Faloutsos, On power-law relationships of the Internet topology, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.251-262, August 30-September 03, 1999, Cambridge, Massachusetts, United States
|
| |
10
|
A. Halavais and D. Lackaff. An Analysis of Topical Coverage of Wikipedia. Journal of Computer-Mediated Communication, 13(2):429--440, 2008.
|
 |
11
|
|
| |
12
|
D. Hawking. Overview of the TREC-9 Web Track. In TREC, 2000.
|
 |
13
|
|
| |
14
|
J. Kamps and M. Koolen. The Importance of Link Evidence in Wikipedia. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume 4956 of Lecture Notes in Computer Science, pages 270--282. Springer Verlag, Heidelberg, 2008.
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
W. Kraaij and T. Westerveld. How Different are Web Documents? In Proceedings of the ninth Text Retrieval Conference, TREC-9. NIST Special Publication, May 2001.
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
F. Å. Nielsen. Scientific citations in Wikipedia. First Monday, 12(8), 2007.
|
| |
23
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
|
 |
24
|
|
| |
25
|
C. Tre. Common evaluation measures. The Twelfth Text REtrieval Conference (TREC 2003), 2003.
|
| |
26
|
|
| |
27
|
J. Voss. Measuring Wikipedia. In Proceedings International Conference of the International Society for Scientometrics and Informetrics, Stockholm, Sweden, 2005.
|
| |
28
|
H. Wu, G. Kazai, and M. Taylor. Book search experiments: Investigating ir methods for the indexing and retrieval of books. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume 4956 of Lecture Notes in Computer Science, pages 234--245. Springer Verlag, Heidelberg, 2008.
|
| |
29
|
V. Zlatic, M. Bozicevic, H. Stefancic, and M. Domazet. Wikipedias: Collaborative web-based encyclopedias as complex networks. Physical Review E, Jul 2006.
|
|