ACM Home Page
Please provide us with feedback. Feedback
Personalization from incomplete data: what you don't know can hurt
Full text PdfPdf (811 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 154 - 163  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
Balaji Padmanabhan  The Wharton School, University of Pennsylvania, Philadelphia, PA
Zhiqiang Zheng  The Wharton School, University of Pennsylvania, Philadelphia, PA
Steven O. Kimbrough  The Wharton School, University of Pennsylvania, Philadelphia, PA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 63,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502535
What is a DOI?

ABSTRACT

Clickstream data collected at any web site (site-centric data) is inherently incomplete, since it does not capture users' browsing behavior across sites (user-centric data). Hence, models learned from such data may be subject to limitations, the nature of which has not been well studied. Understanding the limitations is particularly important since most current personalization techniques are based on site-centric data only. In this paper, we empirically examine the implications of learning from incomplete data in the context of two specific problems: (a) predicting if the remainder of any given session will result in a purchase and (b) predicting if a given user will make a purchase at any future session. For each of these problems we present new algorithms for fast and accurate data preprocessing of clickstream data. Based on a comprehensive experiment on user-level clickstream data gathered from 20,000 users' browsing behavior, we demonstrate that models built on user-centric data outperform models built on site-centric data for both prediction tasks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Aggarwal, C.C., Sun, Z., and Yu, P.S., 1998, Online Generation of Profile Association Rules'. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining.
 
3
Ansari, S., 2000. Integrating E-Commerce and Data Mining: Architecture and Challenges, Web-KDD, Aug., 2000.
 
4
Brodley, C., and Kohavi, R., 2000, Peel the Onion, KDD- CUP 2000, Boston, 2000.
 
5
Chan, P.K., 1999. A Non-Invasive Learning Approach to Building Web User Profiles. In Proceedings WebKDD 1999.
6
 
7
Johnson, E., Moe, W., Fader, P., Bellman, S., and Lohse, J., 2000, On the Depth and Dynamics of Online Search Behavior, Wharton School Working Paper #00-014, June, 2000.
 
8
Khabaza, T., 2001, "As E-asy as Falling Offa Web Log, Data mining Hits the Web", SPSS Data Mining Magazine, January.
 
9
Kimbrough, S., Padmanabhan, B., and Zheng, Z., 2000, On Usage Metric for Determining Authoritative Sites, In the Proceedings of WITS 2000, Brisbane, Australia.
 
10
Korgaonkar, P., and Wolin, L.D., 1999, A Multivariate analysis of Web usage, J. of Advertising Research, 39, pp 53-68.
 
11
 
12
Mobasher, B., Dai H., 2000, Discovery of Aggregate Usage Profiles for Web Personalization, Web-KDD, Aug., 2000
 
13
Mobasher, B., Cooley, R., Srivastava J., 1999, Automatic Personalization Based on Web Usage Mining, Technical Report of Depaul University, TR 99-010.
 
14
Moe, W., and Fader, P., 2000, Which Visits Lead to Purchases? Dynamic Conversion Behavior at e-Commerce Sites, The Wharton School, Working Paper #00-023. Aug. 2000 (A)
 
15
Moe, W., and Fader, P., 2000, Capturing Evolving Visit Behavior in Clickstream Data, The Wharton School, Working Paper #00-003, Aug. 2000 (B).
 
16
 
17
Nasraoui, O., Frigui, H., Joshi, A., Krishnapuram, R., 1999, Mining Web Access Logs Using Relational Competitive Fuzzy Clustering, In the Proceedings of the Eight International Fuzzy Systems Association World Congress, Taipei, August, 1999.
 
18
Padmanabhan, B., Zheng, Z., Kimbrough, S., 2001, A Comparison of Site-Centric and User-Centric Data Mining Approaches to Predicting Session-Level Purchase Behavior on the Web, The Wharton School OPIM Dept Working Paper 01-01-03.
 
19
Park, Y., Fader, P., 2000, Modeling Browsing Behavior at Multiple Sites, In the Proceedings of Informs Marketing Science Conference, Los Angels, June 2000.
 
20
Perkowitz, M., Etzioni, O, 1997, Adaptive web sites: an AI challenge, In Proceedings of the 15th International Joint Conference on Artificial Intelligence.
 
21
 
22
 
23
 
24
Sen, S., Padmanabhan, B., Tuzhilin, A., White, N., and Stein, R., 1998. The Identification and Satisfaction Of Consumer Analysis-Driven Information Needs Of Marketers on The WWW, European Journal Of Marketing (32:7/8), pp. 688- 702.
 
25
Theusinger, C., Huber, K., 2000. Analyzing the Footsteps of Your Customers, Web-KDD 2000.
26


Collaborative Colleagues:
Balaji Padmanabhan: colleagues
Zhiqiang Zheng: colleagues
Steven O. Kimbrough: colleagues