|
ABSTRACT
The primary goal of Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of user sessions and discovering association rules or frequent navigational paths, do not generally provide the ability to automatically characterize or quantify the unobservable factors that lead to common navigational patterns. It is, therefore, necessary to develop techniques that can automatically discover hidden semantic relationships among users as well as between users and Web objects. Probabilistic Latent Semantic Analysis (PLSA) is particularly useful in this context, since it can uncover latent semantic associations among users and pages based on the co-occurrence patterns of these pages in user sessions. In this paper, we develop a unified framework for the discovery and analysis of Web navigational patterns based on PLSA. We show the flexibility of this framework in characterizing various relationships among users and Web objects. Since these relationships are measured in terms of probabilities, we are able to use probabilistic inference to perform a variety of analysis tasks such as user segmentation, page classification, as well as predictive tasks such as collaborative recommendations. We demonstrate the effectiveness of our approach through experiments performed on real-world data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
T. Brants and R. Stolle. Find similar documents in document collections. In Proceedings of the Third International Conference on Language Resources and Evaluation(LREC-2002), Las Palmas, Spain, June 2002.
|
| |
5
|
|
| |
6
|
D. Cohn and T. Hofmann. The missing link: A probabilistic model of document content and hypertext connectivity. In T. G. D. Todd K. Leen and V. Tresp, editors, Advances in Neural Information Processing Systems 13. MIT Press, 2001.
|
| |
7
|
|
| |
8
|
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems, 1(1), 1999.
|
| |
9
|
H. Dai and B. Mobasher. Using ontologies to discover domain-level web usage profiles. In Proceedings of the 2nd Semantic Web Mining Workshop at ECML/PKDD 2002, Helsinki, Finland, August 2002.
|
| |
10
|
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Hashman. Indexing by latent semantic indexing. Journal of the American Society for Information Science, 41(6), 1990.
|
| |
11
|
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Statistical Society, B(39):1--38, 1977.
|
| |
12
|
|
| |
13
|
R. Ghani and A. Fano. Building recommender systems using a knowledge base of product semantics. In Proceedings of the Workshop on Recommendation and Personalization in E-Commerce, at the 2nd Int'l Conf. on Adaptive Hypermedia and Adaptive Web Based Systems, Malaga, Spain, May 2002.
|
| |
14
|
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 1999.
|
 |
15
|
|
| |
16
|
|
| |
17
|
T. Hofmann and J. Puzicha. Unsupervised learning from dyadic data. Technical report, UC, Berkeley, Berkeley, CA, 1998.
|
| |
18
|
Y. Kim, J. Chang, and B. Zhang. a empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition. In Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-03), Seol, Koera, April 2003.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
O. Nasraoui, R. Krishnapuram, A. Joshi, and T. Kamdar. Automatic web user profiling and personalization using robust fuzzy relational clustering. In J. Segovia, P. Szczepaniak, and M. Niedzwiedzinski, editors, Studies in Fuzziness and Soft Computing. Springer-Verlag, 2002.
|
| |
26
|
|
| |
27
|
|
| |
28
|
J. Pitkow and P. Pirolli. Mining longest repeating subsequences to predict www surfing. In Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, Colorado, October 1999.
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
|
 |
33
|
|
 |
34
|
|
CITED BY 11
|
|
Jian-Tao Sun , Hua-Jun Zeng , Huan Liu , Yuchang Lu , Zheng Chen, CubeSVD: a novel approach to personalized Web search, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
|
|
|
|
|
|
|
|
|
|
|
|
Yang Song , Jian Huang , Isaac G. Councill , Jia Li , C. Lee Giles, Efficient topic-based unsupervised name disambiguation, Proceedings of the 2007 conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaohui Wu , Jun Yan , Ning Liu , Shuicheng Yan , Ying Chen , Zheng Chen, Probabilistic latent semantic user segmentation for behavioral targeted advertising, Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising, p.10-17, June 28-28, 2009, Paris, France
|
|