| Improving web site search using web server logs |
| Full text |
Html
(2 KB),
Pdf
(272 KB)
|
| Source
|
IBM Centre for Advanced Studies Conference
archive
Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
table of contents
Toronto, Ontario, Canada
SESSION: Web services
table of contents
Article No. 22
Year of Publication: 2006
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 72, Citation Count: 0
|
|
|
ABSTRACT
Despite the success of global search engines, web site search engines are still suffering from poor performance. Since a web site is different from the whole web in link structure, access pattern, and data scale, it is not always successful when the methods which improve the performance of web search are applied to web site search. In this paper, we propose a novel algorithm to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and the relationships of web pages in the session are analyzed based on their similarities. Then, a new web page representation is generated. Anchor text is used to create another representation. They are combined with original text-based representation in web site search. Two kinds of combination methods are investigated and tested: combination of document representations and combination of ranking scores. Our experimental results show that our algorithm can improve the retrieval accuracy for the four retrieval models we tested: Inference Network Model, Okapi Model, Cosine Similarity Model and TFIDF Model. The highest performance increase from web log analysis is from TFIDF model, and overall, inference network model with web log information achieves the best result.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Indri Retrieval Model, http://ciir.cs.umass.edu/~metzler/indriretmodel.html.
|
| |
2
|
Google, http://google.com.
|
| |
3
|
DirectHit, http://www.directhit.com.
|
| |
4
|
Lemur Project, http://www.lemurproject.org/.
|
| |
5
|
MSN search, http://search.msn.com/.
|
| |
6
|
Yahoo, http://www.yahoo.com.
|
 |
7
|
|
| |
8
|
|
 |
9
|
Bodo Billerbeck , Falk Scholer , Hugh E. Williams , Justin Zobel, Query expansion using associated queries, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
[doi> 10.1145/956863.956866]
|
| |
10
|
B. Billerbeck and J. Zobel, Document Expansion versus Query Expansion for Ad-hoc Retrieval, In Proceedings of the 10th Australasian Document Computing Symposium, 2005.
|
| |
11
|
|
| |
12
|
|
| |
13
|
K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan, Information filtering, novelty detection, and named-page finding, In Proceedings of the 11th Text REtrieval Conference (TREC-11), 2002.
|
 |
14
|
|
 |
15
|
|
| |
16
|
R. Cooley, B. Mobasher, and J. Srivastava, Data preparation for mining World Wide Web browsing patterns, In Knowledge and Information Systems, 1(1):5--32, 1999.
|
 |
17
|
|
 |
18
|
|
 |
19
|
Ronald Fagin , Ravi Kumar , Kevin S. McCurley , Jasmine Novak , D. Sivakumar , John A. Tomlin , David P. Williamson, Searching the workplace web, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
[doi> 10.1145/775152.775204]
|
| |
20
|
P. Hagen, H. Manning, and Y. Paul, Must search stink? The Forrester report, Forrester, 2000.
|
| |
21
|
|
| |
22
|
K. L. Kwok, TREC2004 Robust Track Experiments using PIRCS, In Proceedings of the 13th Text REtrieval Conference, 2004.
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
Sung Hyon Myaeng , Don-Hyun Jang , Mun-Seok Kim , Zong-Cheol Zhoo, A flexible model for retrieval of SGML documents, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.138-145, August 24-28, 1998, Melbourne, Australia
[doi> 10.1145/290941.290980]
|
 |
27
|
|
 |
28
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , Zheng Chen , Wei-Ying Ma, A study of relevance propagation for web search, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076105]
|
| |
29
|
A. Shakery, C. X. Zhai, Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Frack Experiments, In the Proceedings of the 12th TREC, 2003.
|
| |
30
|
|
| |
31
|
R. Song, J. R. Wen, S. M. Shi, G. M. Xin, T. Y. Liu, T. Qin, X. Zheng, J. Y. Zhang, G. R. Xue, and W. Y. Ma, Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004, In the Proceedings of the 13th TREC, 2004.
|
 |
32
|
|
| |
33
|
|
 |
34
|
Gui-Rong Xue , Hua-Jun Zeng , Zheng Chen , Wei-Ying Ma , Hong-Jiang Zhang , Chao-Jun Lu, Implicit link analysis for small web search, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860448]
|
|