|
ABSTRACT
Understanding user behaviors on Web sites enables site owners to make sites more usable, ultimately helping users to achieve their goals more quickly. Accordingly, researchers have devised methods for categorizing user sessions in hopes of revealing user interests. These techniques build user profiles by combining users' navigation paths with other data features, such as page viewing time, hyperlink structure, and page content. Previously, we have presented complex techniques of combining many of these data features to cluster user profiles. In this paper, we introduce a user study and a systematic evaluation of these different data features and their associated weighting schemes. We present the results of our study, including accuracy measures for a number of clustering approaches, and offer recommendations for Web analysts. While further investigation over more sites is needed to definitively settle on a robust scheme, we have characterized this analytic space
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Banerjee, A. and Ghosh, J. Clickstream Clustering using Weighted Longest Common Subsequences, in Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining (Chicago IL, April 2001), 33--40.
|
 |
2
|
Rob Barrett , Paul P. Maglio , Daniel C. Kellem, How to personalize the Web, Proceedings of the SIGCHI conference on Human factors in computing systems, p.75-82, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258595]
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
Heer, J. and Chi, E.H. Identification of Web User Traffic Composition using Multi-Modal Clustering and Information Scent, in Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining (Chicago IL, April 2001), 51--58.
|
| |
8
|
Karypis, G. and Han, E. Concept indexing: A fast dimensionality reduction algorithm with applications to document retrieval and categorization. Technical Report TR-00-0016, University of Minnesota, 2000.
|
| |
9
|
MacQueen, J. Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (1967), UC Berkeley Press, 281--297.
|
| |
10
|
Nielsen, Jakob. Did Poor Usability Kill E-Commerce?, in Jakob Nielsen's Alertbox (August 19, 2001). http://www.useit.com/alertbox/20010819.html
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Pitkow, J. and Pirolli, P. Mining longest repeated subsequences to predict World Wide Web surfing, in Proceedings of USITS '99: The 2nd USENIX Conference on Internet Technologies and Systems (Boulder CO, October 1999).
|
| |
15
|
Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining (Chicago IL, April 2001).
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
Stabin, T. and Glasson, C.E. First Impression: 7 commerical log processing tools slice and dice logs your way, (1997). Available at http://www.netscapeworld.com/netscapeworld/ nw08-1997/nw-08-loganalysis.html
|
| |
20
|
Proc. of the SIGKDD Workshop on Web Data Mining (WEBKDD01) (San Francisco CA, August 2001).
|
| |
21
|
|
| |
22
|
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
David R. Millen , Michael J. Muller , Werner Geyer , Eric Wilcox , Beth Brownholtz, Patterns of media use in an activity-centric collaborative environment, Proceedings of the SIGCHI conference on Human factors in computing systems, April 02-07, 2005, Portland, Oregon, USA
|
|
|
Ed H. Chi , Adam Rosien , Gesara Supattanasiri , Amanda Williams , Christiaan Royer , Celia Chow , Erica Robles , Brinda Dalal , Julie Chen , Steve Cousins, The bloodhound project: automating discovery of web usability issues using the InfoScentπ simulator, Proceedings of the SIGCHI conference on Human factors in computing systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA
|
|
|
Hartmut Obendorf , Harald Weinreich , Eelco Herder , Matthias Mayer, Web page revisitation revisited: implications of a long-term click-stream study of browser usage, Proceedings of the SIGCHI conference on Human factors in computing systems, April 28-May 03, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Clustering
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.4
Hypertext/Hypermedia
Subjects:
Navigation
I.
Computing Methodologies
I.5
PATTERN RECOGNITION
I.5.3
Clustering
Subjects:
Similarity measures
General Terms:
Design,
Experimentation,
Performance,
Theory
Keywords:
World Wide Web,
classification,
clustering,
data mining,
user categorization,
user patterns,
user profile,
user study,
web mining
|