ACM Home Page
Please provide us with feedback. Feedback
Web usage mining based on probabilistic latent semantic analysis
Full text PdfPdf (747 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
SESSION: Research track papers table of contents
Pages: 197 - 205  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Xin Jin  DePaul University, Chicago, IL
Yanzan Zhou  DePaul University, Chicago, IL
Bamshad Mobasher  DePaul University, Chicago, IL
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 43,   Downloads (12 Months): 326,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014076
What is a DOI?

ABSTRACT

The primary goal of Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of user sessions and discovering association rules or frequent navigational paths, do not generally provide the ability to automatically characterize or quantify the unobservable factors that lead to common navigational patterns. It is, therefore, necessary to develop techniques that can automatically discover hidden semantic relationships among users as well as between users and Web objects. Probabilistic Latent Semantic Analysis (PLSA) is particularly useful in this context, since it can uncover latent semantic associations among users and pages based on the co-occurrence patterns of these pages in user sessions. In this paper, we develop a unified framework for the discovery and analysis of Web navigational patterns based on PLSA. We show the flexibility of this framework in characterizing various relationships among users and Web objects. Since these relationships are measured in terms of probabilities, we are able to use probabilistic inference to perform a variety of analysis tasks such as user segmentation, page classification, as well as predictive tasks such as collaborative recommendations. We demonstrate the effectiveness of our approach through experiments performed on real-world data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
T. Brants and R. Stolle. Find similar documents in document collections. In Proceedings of the Third International Conference on Language Resources and Evaluation(LREC-2002), Las Palmas, Spain, June 2002.
 
5
 
6
D. Cohn and T. Hofmann. The missing link: A probabilistic model of document content and hypertext connectivity. In T. G. D. Todd K. Leen and V. Tresp, editors, Advances in Neural Information Processing Systems 13. MIT Press, 2001.
 
7
 
8
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems, 1(1), 1999.
 
9
H. Dai and B. Mobasher. Using ontologies to discover domain-level web usage profiles. In Proceedings of the 2nd Semantic Web Mining Workshop at ECML/PKDD 2002, Helsinki, Finland, August 2002.
 
10
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Hashman. Indexing by latent semantic indexing. Journal of the American Society for Information Science, 41(6), 1990.
 
11
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Statistical Society, B(39):1--38, 1977.
 
12
 
13
R. Ghani and A. Fano. Building recommender systems using a knowledge base of product semantics. In Proceedings of the Workshop on Recommendation and Personalization in E-Commerce, at the 2nd Int'l Conf. on Adaptive Hypermedia and Adaptive Web Based Systems, Malaga, Spain, May 2002.
 
14
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 1999.
15
 
16
 
17
T. Hofmann and J. Puzicha. Unsupervised learning from dyadic data. Technical report, UC, Berkeley, Berkeley, CA, 1998.
 
18
Y. Kim, J. Chang, and B. Zhang. a empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition. In Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-03), Seol, Koera, April 2003.
 
19
 
20
 
21
22
 
23
 
24
 
25
O. Nasraoui, R. Krishnapuram, A. Joshi, and T. Kamdar. Automatic web user profiling and personalization using robust fuzzy relational clustering. In J. Segovia, P. Szczepaniak, and M. Niedzwiedzinski, editors, Studies in Fuzziness and Soft Computing. Springer-Verlag, 2002.
 
26
 
27
 
28
J. Pitkow and P. Pirolli. Mining longest repeating subsequences to predict www surfing. In Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, Colorado, October 1999.
 
29
 
30
31
 
32
33
34

CITED BY  11

Collaborative Colleagues:
Xin Jin: colleagues
Yanzan Zhou: colleagues
Bamshad Mobasher: colleagues