| Using ghost edges for classification in sparsely labeled networks |
| Full text |
Mov
(20:47),
Pdf
(849 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 256-264
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Brian Gallagher
|
Lawrence Livermore National Laboratory, Livermore, CA, USA
|
|
Hanghang Tong
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Tina Eliassi-Rad
|
Lawrence Livermore National Laboratory, Livermore, CA, USA
|
|
Christos Faloutsos
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 25, Downloads (12 Months): 272, Citation Count: 2
|
|
|
ABSTRACT
We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L • E, where L is the number of labeled nodes and E is the number of edges.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
3
|
F. Chung. Spectral graph theory. Number 92 in CBMS Regional Conference Series in Mathematics. American Mathematical Sociaty, 1997.
|
| |
4
|
W. W. Cohen. Enron email data set. http://www.cs.cmu.edu/ enron/, 2004.
|
| |
5
|
|
| |
6
|
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 6:721--741, 1984.
|
| |
7
|
|
| |
8
|
D. Jensen. Proximity HEP-TH database. http://kdl.cs.umass.edu/data/hepth/hepth-info.html, 2003.
|
| |
9
|
V. Krebs. Books about U.S. Politics. http://www.orgnet.com/, 2004.
|
| |
10
|
Q. Lu and L. Getoor. Link-based classification. In ICML, pages 496--503, 2003.
|
| |
11
|
S. Macskassy and F. Provost. A simple relational classifier. In Notes of the 2nd Workshop on Multi-relational Data Mining at KDD, 2003.
|
| |
12
|
|
| |
13
|
S. A. Macskassy. Improving learning in networked data by combining explicit and mined links. In AAAI, pages 590--595, 2007.
|
| |
14
|
L. McDowell, K. M. Gupta, and D. W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, 2007.
|
| |
15
|
|
 |
16
|
|
| |
17
|
P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine (Special Issue on AI and Networks), forthcoming.
|
| |
18
|
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, pages 485--492, 2002.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2005.
|
| |
23
|
X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.
|
|