|
ABSTRACT
Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Accrue. http://www.accrue.com.
|
| |
2
|
Alladvantage. http://www.alladvantage.com.
|
| |
3
|
Andromedia aria. http://www.andromedia.com.
|
| |
4
|
Broádvision. http://www.broadvision.com.
|
| |
5
|
Hit list commerce, http://www.marketwave.com.
|
| |
6
|
Likeminds. http://www.andromedia.com.
|
| |
7
|
Netgenesis. http://www.netgenesis.com.
|
| |
8
|
Netperceptions. http://www.netperceptions.com.
|
| |
9
|
Netzero. http://www.netzero.com.
|
| |
10
|
Platform for privacy project. http://www.w3.org/P3P/.
|
| |
11
|
Surfaid analytics. http://surfaid.dfw.ibm.com.
|
| |
12
|
Truste: Building a web you can believe in. http://www.truste.org/.
|
| |
13
|
Webtrends log analyzer. http://www.webtrends.com.
|
| |
14
|
World wide web committee web usage characterization activity. http://www.w3.org/WCA.
|
| |
15
|
European commission, the directive on the protection of individuals with regard ot the processing of personal data and on the free movement of such data. http://www2.echo.lu/, 1998.
|
| |
16
|
Data mining: Crossing the chasm, 1999. Invited talk at the 5th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining(KDD99).
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
M. Balabanovic and Y. Shoham. Learning information retrieval agents: Experiments with automated web browsing. In On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, 1995.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
Edith Cohen , Balachander Krishnamurthy , Jennifer Rexford, Improving end-to-end performance of the Web using server volumes and proxy filters, Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication, p.241-253, August 31-September 04, 1998, Vancouver, British Columbia, Canada
|
| |
28
|
|
| |
29
|
|
| |
30
|
Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), 1999.
|
| |
31
|
Robert Cooley, Pang-Ning Tan, and Jaideep Srivastava. Discovery of interesting usage patterns from web data. Technical Report TR 99-022, University of Minnesota, 1999.
|
 |
32
|
|
| |
33
|
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Proc. ACM KDD, 1994.
|
 |
34
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
| |
35
|
Ed H. Chi , James Pitkow , Jock Mackinlay , Peter Pirolli , Rich Gossweiler , Stuart K. Card, Visualizing the evolution of Web ecologies, Proceedings of the SIGCHI conference on Human factors in computing systems, p.400-407, April 18-23, 1998, Los Angeles, California, United States
[doi> 10.1145/274644.274699]
|
| |
36
|
Bernardo Huberman, Peter Pirolli, James Pitkow, and Rajan Kukose. Strong regularities in world wide web surfing. Technical report, Xerox PARC, 1998.
|
| |
37
|
T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. In The 15th International Conference on Artificial Intelligence, Nagoya, Japan, 1997.
|
 |
38
|
|
| |
39
|
H. Lieberman. Letizia: An agent that assists web browsing. In Proc. of the 1995 International Joint Conference on Artificial Intelligence, Montreal, Canada, 1995.
|
| |
40
|
Stephen Lee Manley. An Analysis of Issues Facing World Wide Web Servers. Undergraduate, Harvard, 1997.
|
| |
41
|
|
| |
42
|
B. Mobasher, N. Jain, E. Han, and J. Srivastava. Web mining: Pattern discovery from world wide web transactions. (TR 96-050), 1996.
|
| |
43
|
|
| |
44
|
Olfa Nasraoui, Raghu Krishnapuram, and Anupam Joshi. Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator. In Eighth International World Wide Web Conference, Toronto, Canada, 1999.
|
| |
45
|
|
| |
46
|
Balaji Padmanabhan and Alexander Tuzhilin. A belief-driven method for discovering unexpected patterns. In Fourth International Conference on Knowledge Discovery and Data Mining, pages 94--100, New York, New York, 1998.
|
| |
47
|
|
| |
48
|
|
| |
49
|
|
 |
50
|
Peter Pirolli , James Pitkow , Ramana Rao, Silk from a sow's ear: extracting usable structures from the Web, Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, p.118-125, April 13-18, 1996, Vancouver, British Columbia, Canada
[doi> 10.1145/238386.238450]
|
| |
51
|
|
| |
52
|
|
| |
53
|
|
| |
54
|
|
| |
55
|
Myra Spiliopoulou and Lukas C Faulstich. Wum: A web utilization miner. In EDBT Workshop WebDB98, Valencia, Spain, 1998. Springer Verlag.
|
| |
56
|
|
| |
57
|
|
| |
58
|
|
 |
59
|
Amir M. Zarkesh , Jafar Adibi , Cyrus Shahabi , Reza Sadri , Vishal Shah, Analysis and design of server informative WWW-sites, Proceedings of the sixth international conference on Information and knowledge management, p.254-261, November 10-14, 1997, Las Vegas, Nevada, United States
[doi> 10.1145/266714.266906]
|
CITED BY 130
|
|
|
|
|
|
|
|
Bamshad Mobasher , Honghua Dai , Tao Luo , Miki Nakagawa, Effective personalization based on association rule discovery from web usage data, Proceedings of the 3rd international workshop on Web information and data management, November 09-01, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
Jidong Wang , Zheng Chen , Li Tao , Wei-Ying Ma , Liu Wenyin, Ranking user's relevance to a topic through link analysis on web logs, Proceedings of the 4th international workshop on Web information and data management, November 08-08, 2002, McLean, Virginia, USA
|
|
|
|
|
|
Bruno Gusmão Rocha , Virgílio A. F. Almeida , Lucila Ishitani , Wagner Meira, Jr., Disclosing users' data in an environment that preserves privacy, Proceedings of the 2002 ACM workshop on Privacy in the Electronic Society, p.71-80, November 21-21, 2002, Washington, DC
|
|
|
|
|
|
|
|
|
|
|
|
Claus Boyens , Oliver Günther , Maximilian Teltzrow, Privacy conflicts in CRM services for online shops: a case study, Proceedings of the IEEE international conference on Privacy, security and data mining, p.27-35, December 01, 2002, Maebashi City, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. I. Papadimitriou , A. I. Vakali , G. Pallis , S. Petridou , A. S. Pomportsis, Simulation in Web data management, Applied system simulation: methodologies and applications, Kluwer Academic Publishers, Norwell, MA, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guandong Xu , Yanchun Zhang , Jiangang Ma , Xiaofang Zhou, Discovering user access pattern based on probabilistic latent factor model, Proceedings of the sixteenth Australasian database conference, p.27-35, January 01, 2005, Newcastle, Australia
|
|
|
|
|
|
Christine Michel , Marc-Eric Bobillier-Chaumon , Véronique Cohen Montandreau , Franck Tarpin-Bernard, Démarche d'évaluation de l'usage et des répercussions psychosociales d'un environnement STIC sur une population de personnes âgées en résidence médicalisée, Proceedings of the 17th conference on 17ème Conférence Francophone sur l'Interaction Homme-Machine, p.195-198, September 27-30, 2005, Toulouse, France
|
|
|
|
|
|
|
|
|
|
|
|
Matthias Bender , Sebastian Michel , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, MINERVA: collaborative P2P search, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiangye Xiao , Longhao Wang , Xing Xie , Qiong Luo, Discovering co-located queries in geographic search logs, Proceedings of the first international workshop on Location and the web, p.77-84, April 22-22, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Olfa Nasraoui , Jeff Cerwinske , Carlos Rojas , Fabio Gonzalez, Collaborative filtering in dynamic usage environments, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
Mike Y. Chen , Anthony Accardi , Emre Kiciman , Jim Lloyd , Dave Patterson , Armando Fox , Eric Brewer, Path-based faliure and evolution management, Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, p.23-23, March 29-31, 2004, San Francisco, California
|
|
|
John Stamey , Jean-Louis Lassez , Daniel Boorn , Ryan Rossi, Client-side dynamic metadata in web 2.0, Proceedings of the 25th annual ACM international conference on Design of communication, October 22-24, 2007, El Paso, Texas, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Colin Tattersall , Jocelyn Manderveld , Bert Berg , René Es , José Janssen , Rob Koper, Self Organising Wayfinding Support for Lifelong Learners, Education and Information Technologies, v.10 n.1-2, p.111-123, January 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yang Sun , Huajing Li , Isaac G. Councill , Jian Huang , Wang-Chien Lee , C. Lee Giles, Personalized ranking for digital libraries based on log analysis, Proceeding of the 10th ACM workshop on Web information and data management, October 30-30, 2008, Napa Valley, California, USA
|
|
|
Qihong Shao , Yi Chen , Shu Tao , Xifeng Yan , Nikos Anerousis, Efficient ticket routing by resolution sequence mining, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
G. Castellano , A. M. Fanelli , M. A. Torsello, Mining usage profiles from access data using fuzzy clustering, Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, p.157-160, September 22-24, 2006, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Masayuki Kessoku , Kazuhiko Tsuda , El-Sayed Atlam , Kazuhiro Morita , Masao Fuketa , Jun-ichi Aoe, A method to implement effective My-page service system using three-dimensional vectors, International Journal of Computer Applications in Technology, v.35 n.2/3/4, p.262-270, June 2009
|
|
|
|
|
|
Ding Zhou , Levent Bolelli , Jia Li , C. Lee Giles , Hongyuan Zha, Learning user clicks in web search, Proceedings of the 20th international joint conference on Artifical intelligence, p.1162-1167, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
|
|