|
ABSTRACT
Information diffusion, viral marketing, and collective classification all attempt to model and exploit the relationships in a network to make inferences about the labels of nodes. A variety of techniques have been introduced and methods that combine attribute information and neighboring label information have been shown to be effective for collective labeling of the nodes in a network. However, in part because of the correlation between node labels that the techniques exploit, it is easy to find cases in which, once a misclassification is made, incorrect information propagates throughout the network. This problem can be mitigated if the system is allowed to judiciously acquire the labels for a small number of nodes. Unfortunately, under relatively general assumptions, determining the optimal set of labels to acquire is intractable. Here we propose an acquisition method that learns the cases when a given collective classification algorithm makes mistakes, and suggests acquisitions to correct those mistakes. We empirically show on both real and synthetic datasets that this method significantly outperforms a greedy approximate inference approach, a viral marketing approach, and approaches based on network structural measures such as node degree and network clustering. In addition to significantly improving accuracy with just a small amount of labeled data, our method is tractable on large networks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Bilgic and L. Getoor. VOILA: Efficient feature-value acquisition for classification. In AAAI, 2007.
|
 |
2
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
3
|
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129--145, 1996.
|
 |
4
|
C. Lee Giles , Kurt D. Bollacker , Steve Lawrence, CiteSeer: an automatic citation indexing system, Proceedings of the third ACM conference on Digital libraries, p.89-98, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276685]
|
| |
5
|
W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Interdisciplinary Statistics. Chapman & Hall/CRC, 1996.
|
| |
6
|
R. A. Howard. Information value theory. IEEE Transactions on Systems Science and Cybernetics, 2(1):22--26, 1966.
|
 |
7
|
|
| |
8
|
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Machine Learning, 1999.
|
 |
9
|
|
| |
10
|
A. Krause and C. Guestrin. Optimal nonmyopic value of information in graphical models - efficient algorithms and theoretical limits. In IJCAI, 2005.
|
 |
11
|
|
 |
12
|
|
| |
13
|
Q. Lu and L. Getoor. Link based classification. In ICML, 2003.
|
| |
14
|
S. Macskassy and F. Provost. A simple relational classifier. In Workshop on Multi-Relational Data Mining in conj. with KDD (MRDM), 2003.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
L. McDowell, K. M. Gupta, and D. W. Aha. Cautious inference in collective classification. In AAAI, 2007.
|
| |
19
|
J. Neville and D. Jensen. Iterative classification in relational data. In SRL Workshop in AAAI, 2000.
|
| |
20
|
M. E. J. Newman. Mixing patterns in networks. Physical Review E, 67(2):026126, Feb 2003.
|
 |
21
|
|
| |
22
|
Matthew J. Rattigan , Marc Maier , David Jensen Bin Wu , Xin Pei , JianBin Tan , Yi Wang, Exploiting Network Structure for Active Inference in Collective Classification, Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, p.429-434, October 28-31, 2007
[doi> 10.1109/ICDMW.2007.41]
|
 |
23
|
|
| |
24
|
P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. Technical Report CS-TR-4905, University of Maryland, College Park, 2008.
|
 |
25
|
|
| |
26
|
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, 2002.
|
| |
27
|
|
| |
28
|
J. Yedidia, W.T.Freeman, and Y. Weiss. Generalized belief propagation. In NIPS, 2000.
|
 |
29
|
|
CITED BY
|
|
Kensuke Onuma , Hanghang Tong , Christos Faloutsos, TANGENT: a novel, 'Surprise me', recommendation algorithm, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|