|
ABSTRACT
In analyzing data from social and communication networks, we encounter the problem of classifying objects where there is an explicit link structure amongst the objects. We study the problem of inferring the classification of all the objects from a labeled subset, using only the link-based information amongst the objects. We abstract the above as a labeling problem on multigraphs with weighted edges. We present two classes of algorithms, based on local and global similarities. Then we focus on multigraphs induced by blog data, and carefully apply our general algorithms to specifically infer labels such as age, gender and location associated with the blog based only on the link-structure amongst them. We perform a comprehensive set of experiments with real, large-scale blog data sets and show that significant accuracy is possible from little or no non-link information, and our methods scale to millions of nodes and edges.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A.Van Assche, C. Vens, H. Blockeel, and S. Dzeroski. A random forest approach to relational learning. In Workshop on Statistical Relational Learning, 2004.
|
| |
3
|
S. Bhagat, G. Cormode, S. Muthukrishnan, I. Rozenbaum, and H. Xue. No blog is an island - analyzing connections across information networks. In Intl. Conference on Weblogs and Social Media, 2007.
|
| |
4
|
J. D. Burger and J. C. Henderson. Barely legal writers: An exploration of features for predicting blogger age. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
|
 |
5
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
6
|
P. Domingos and M. Richardson. Markov logic: A unifying framework for statistical relational learning. In Workshop on Statistical Relational Learning, 2004.
|
| |
7
|
|
 |
8
|
Jian Hu , Hua-Jun Zeng , Hua Li , Cheng Niu , Zheng Chen, Demographic prediction based on user's browsing behavior, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242594]
|
 |
9
|
|
| |
10
|
Q. Lu and L. Getoor. Link-based classification. In Intl. Conference on Machine Learning, 2003.
|
| |
11
|
I. MacKinnon and R. H. Warren. Age and geographic inferences of the LiveJournal social network. In Statistical Network Analysis Workshop, 2006.
|
| |
12
|
S. A. Macskassy and F. Provost. A simple relational classifier. In Workshop on Multi-Relational Data Mining, 2003.
|
| |
13
|
M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415-444, 2001.
|
| |
14
|
G. Mishne. Experiments with mood classification in blog posts. In Workshop on Stylistic Analysis of Text for Information Access, 2005.
|
| |
15
|
J. Neville and D. Jensen. Iterative Classification in Relational Data. In Workshop on Learning Statistical Models from Relational Data, 2000.
|
 |
16
|
|
| |
17
|
H. Qu, A. L. Pietra, and S. Poon. Classifying blogs using NLP: Challenges and pitfalls. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
|
| |
18
|
J. Schler, M. Koppel, S. Argamon, and J. Pennebaker. Effects of age and gender on blogging. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
|
| |
19
|
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Conference on Uncertainty in Artificial Intelligence, 2002.
|
| |
20
|
|
| |
21
|
J. Yedidia, W. Freeman, and Y. Weiss. Generalized belief propagation. In Neural Information Processing Systems, 2000.
|
 |
22
|
|
| |
23
|
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Neural Information Processing Systems, 2004.
|
 |
24
|
|
| |
25
|
X. Zhu. Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison, 2006.
|
| |
26
|
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In Intl. Conference on Machine Learning, 2003.
|
CITED BY
|
|
Haizheng Zhang , John Yen , C. Lee Giles , Bamshad Mombaster , Myra Spiliopoulou , Jaideep Srivastava , Olfa Nasraoui , Andrew McCallum, WebKDD/SNAKDD 2007: web mining and social network analysis post-workshop report, ACM SIGKDD Explorations Newsletter, v.9 n.2, December 2007
|
|