|
ABSTRACT
Demographic information plays an important role in personalized web applications. However, it is usually not easy to obtain this kind of personal data such as age and gender. In this paper, we made a first approach to predict users' gender and age from their Web browsing behaviors, in which the Webpage view information is treated as a hidden variable to propagate demographic information between different users. There are three main steps in our approach: First, learning from the Webpage click-though data, Webpages are associated with users' (known) age and gender tendency through a discriminative model; Second, users' (unknown) age and gender are predicted from the demographic information of the associated Webpages through a Bayesian framework; Third, based on the fact that Webpages visited by similar users may be associated with similar demographic tendency, and users with similar demographic information would visit similar Webpages, a smoothing component is employed to overcome the data sparseness of web click-though log. Experiments are conducted on a real web click-through log to demonstrate the effectiveness of the proposed approach. The experimental results show that the proposed algorithm can achieve up to 30.4% improvements on gender prediction and 50.3% on age prediction in terms of macro F1, compared to baseline algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Berryman-Fink, C. L., J. R. Wilcox (1983). A multivariate investigation of perceptual attributions concerning gender appropriateness in language, Sex Roles 9, 1983.
|
| |
2
|
Biber, D., S. Conrad, R. Reppen (1998). Corpus Linguistics Investigating Language Structure and Use, Cambridge University Press, Cambridge, 1998.
|
| |
3
|
Computerworld Report: Men Want Facts, Women Seek Personal Connections on Web, http://www.computerworld.com/developmenttopics/websitemgmt/story/0,10801,107391p2,00.html.
|
| |
4
|
Eckert, P. (1997). Gender and sociolinguistic variation, in J. Coates ed., Readings in Language and Gender, Blackwell, Oxford 1997, pp. 64--75.
|
| |
5
|
Herring, S. (1996). Two variants of an electronic message schema, in S. Herring ed., Computer-Mediated Communication: Linguistic, Social and Cross-Cultural Perspectives (John Benjamins, Amsterdam, 1996), pp. 81--106.
|
| |
6
|
Holmes, J. (1993). Women's talk: The question of sociolinguistic universals, Australian Journal of Communications 20, 3, 1993.
|
| |
7
|
Google Personal. http://labs.google.com/personalized.
|
| |
8
|
J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pages 43--52. Morgan Kaufman, 1998.
|
| |
9
|
Lakoff, R. T. (1975). Language and Women's Place, Harper Colophon Books, New York, 1975.
|
 |
10
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
| |
11
|
M. Koppel, J. Schler, S. Argamon, and J.W. Pennebaker. Effects of age. and gender on blogging. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, 2006.
|
| |
12
|
M. Koppel, S. Argamon and A. R. Shimoni (2003). Automatically Categorizing Written Texts by Author Gender. In Literary and Linguistic Computing, 2003. Mulac, A., L. B. Studley, S. Blau (1990). The gender-linked language effect in primary and secondary students' impromptu essays, Sex Roles 23, 9/10, 1990.
|
| |
13
|
Mulac, A., L. B. Studley, S. Blau (1990). The gender-linked language effect in primary and secondary students' impromptu essays, Sex Roles 23, 9/10, 1990.
|
| |
14
|
Mulac, A., T. L. Lundell (1994). Effects of gender-linked language differences in adults' written discourse: Multivariate tests of language effects, Language & Communication 14, 3, 1994.
|
| |
15
|
Palander-Collin, M. (1999). Male and female styles in 17th century correspondence, Language Variation and Change 11, pp. 123--141.
|
 |
16
|
|
| |
17
|
Simkins-Bullock, J. A., B. G. Wildman (1991). An investigation into the relationship between gender and language, Sex Roles 24, 1991.
|
| |
18
|
Search Engine Watch Journal, Behavioral Targeting and Contextual Advertising, http://www.searchenginejournal.com/?p=836.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
iMedia Connection: Behavioral Targeting Online Ad Spend, http://www.imediaconnection.com/content/9236.asp
|
| |
23
|
|
| |
24
|
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction in recommender systems-a case study, 2000.
|
| |
25
|
|
 |
26
|
|
| |
27
|
Pazzani M., Muramatsu J., and Billsus D. Syskill & Webert: Identifying Interesting Web Sites. In Proc. of the 13th National Conference on Artificial Intelligence, pages: 54--61, 1996.
|
| |
28
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
| |
29
|
Amanda Lenhart, Susannah Fox. Bloggers: A portrait of the internet's new storytellers. http://www.pewinternet.org/pdfs/ PIP%20Bloggers%20Report%20July%2019%202006.pdf
|
| |
30
|
|
| |
31
|
|
|