ACM Home Page
Please provide us with feedback. Feedback
Improving web site search using web server logs
Full text HtmlHtml (2 KB),  PdfPdf (272 KB)
Source IBM Centre for Advanced Studies Conference archive
Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research table of contents
Toronto, Ontario, Canada
SESSION: Web services table of contents
Article No. 22  
Year of Publication: 2006
Authors
Jin Zhou  Ryerson University, Toronto, ON, Canada
Chen Ding  Ryerson University, Toronto, ON, Canada
Dimitrios Androutsos  Ryerson University, Toronto, ON, Canada
Sponsors
: IBM Toronto Lab
: CAS
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 72,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1188966.1188996
What is a DOI?

ABSTRACT

Despite the success of global search engines, web site search engines are still suffering from poor performance. Since a web site is different from the whole web in link structure, access pattern, and data scale, it is not always successful when the methods which improve the performance of web search are applied to web site search. In this paper, we propose a novel algorithm to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and the relationships of web pages in the session are analyzed based on their similarities. Then, a new web page representation is generated. Anchor text is used to create another representation. They are combined with original text-based representation in web site search. Two kinds of combination methods are investigated and tested: combination of document representations and combination of ranking scores. Our experimental results show that our algorithm can improve the retrieval accuracy for the four retrieval models we tested: Inference Network Model, Okapi Model, Cosine Similarity Model and TFIDF Model. The highest performance increase from web log analysis is from TFIDF model, and overall, inference network model with web log information achieves the best result.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Indri Retrieval Model, http://ciir.cs.umass.edu/~metzler/indriretmodel.html.
 
2
Google, http://google.com.
 
3
DirectHit, http://www.directhit.com.
 
4
Lemur Project, http://www.lemurproject.org/.
 
5
MSN search, http://search.msn.com/.
 
6
Yahoo, http://www.yahoo.com.
7
 
8
9
 
10
B. Billerbeck and J. Zobel, Document Expansion versus Query Expansion for Ad-hoc Retrieval, In Proceedings of the 10th Australasian Document Computing Symposium, 2005.
 
11
 
12
 
13
K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan, Information filtering, novelty detection, and named-page finding, In Proceedings of the 11th Text REtrieval Conference (TREC-11), 2002.
14
15
 
16
R. Cooley, B. Mobasher, and J. Srivastava, Data preparation for mining World Wide Web browsing patterns, In Knowledge and Information Systems, 1(1):5--32, 1999.
17
18
19
 
20
P. Hagen, H. Manning, and Y. Paul, Must search stink? The Forrester report, Forrester, 2000.
 
21
 
22
K. L. Kwok, TREC2004 Robust Track Experiments using PIRCS, In Proceedings of the 13th Text REtrieval Conference, 2004.
23
24
25
26
27
28
 
29
A. Shakery, C. X. Zhai, Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Frack Experiments, In the Proceedings of the 12th TREC, 2003.
 
30
 
31
R. Song, J. R. Wen, S. M. Shi, G. M. Xin, T. Y. Liu, T. Qin, X. Zheng, J. Y. Zhang, G. R. Xue, and W. Y. Ma, Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004, In the Proceedings of the 13th TREC, 2004.
32
 
33
34

Collaborative Colleagues:
Jin Zhou: colleagues
Chen Ding: colleagues
Dimitrios Androutsos: colleagues