ACM Home Page
Please provide us with feedback. Feedback
Using ghost edges for classification in sparsely labeled networks
Full text MovMov (20:47),  PdfPdf (849 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 256-264  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Brian Gallagher  Lawrence Livermore National Laboratory, Livermore, CA, USA
Hanghang Tong  Carnegie Mellon University, Pittsburgh, PA, USA
Tina Eliassi-Rad  Lawrence Livermore National Laboratory, Livermore, CA, USA
Christos Faloutsos  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 25,   Downloads (12 Months): 272,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401925
What is a DOI?

ABSTRACT

We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse.

In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to LE, where L is the number of labeled nodes and E is the number of edges.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
F. Chung. Spectral graph theory. Number 92 in CBMS Regional Conference Series in Mathematics. American Mathematical Sociaty, 1997.
 
4
W. W. Cohen. Enron email data set. http://www.cs.cmu.edu/ enron/, 2004.
 
5
 
6
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 6:721--741, 1984.
 
7
 
8
D. Jensen. Proximity HEP-TH database. http://kdl.cs.umass.edu/data/hepth/hepth-info.html, 2003.
 
9
V. Krebs. Books about U.S. Politics. http://www.orgnet.com/, 2004.
 
10
Q. Lu and L. Getoor. Link-based classification. In ICML, pages 496--503, 2003.
 
11
S. Macskassy and F. Provost. A simple relational classifier. In Notes of the 2nd Workshop on Multi-relational Data Mining at KDD, 2003.
 
12
 
13
S. A. Macskassy. Improving learning in networked data by combining explicit and mined links. In AAAI, pages 590--595, 2007.
 
14
L. McDowell, K. M. Gupta, and D. W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, 2007.
 
15
16
 
17
P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine (Special Issue on AI and Networks), forthcoming.
 
18
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, pages 485--492, 2002.
 
19
 
20
 
21
 
22
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2005.
 
23
X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.


Collaborative Colleagues:
Brian Gallagher: colleagues
Hanghang Tong: colleagues
Tina Eliassi-Rad: colleagues
Christos Faloutsos: colleagues