ACM Home Page
Please provide us with feedback. Feedback
Heterogeneous source consensus learning via decision propagation and negotiation
Full text MovMov (14:37),  PdfPdf (714 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 339-348  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Jing Gao  University of Illinois, Urbana-Champaign, Urbana, IL, USA
Wei Fan  IBM TJ Watson Research Center, Hawthorne, NY, USA
Yizhou Sun  University of Illinois, Urbana-Champaign, Urbana, IL, USA
Jiawei Han  University of Illinois, Urbana-Champaign, Urbana, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 41,   Downloads (12 Months): 102,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557061
What is a DOI?

ABSTRACT

Nowadays, enormous amounts of data are continuously generated not only in massive scale, but also from different, sometimes conflicting, views. Therefore, it is important to consolidate different concepts for intelligent decision making. For example, to predict the research areas of some people, the best results are usually achieved by combining and consolidating predictions obtained from the publication network, co-authorship network and the textual content of their publications. Multiple supervised and unsupervised hypotheses can be drawn from these information sources, and negotiating their differences and consolidating decisions usually yields a much more accurate model due to the diversity and heterogeneity of these models. In this paper, we address the problem of "consensus learning" among competing hypotheses, which either rely on outside knowledge (supervised learning) or internal structure (unsupervised clustering). We argue that consensus learning is an NP-hard problem and thus propose to solve it by an efficient heuristic method. We construct a belief graph to first propagate predictions from supervised models to the unsupervised, and then negotiate and reach consensus among them. Their final decision is further consolidated by calculating each model's weight based on its degree of consistency with other models. Experiments are conducted on 20 Newsgroups data, Cora research papers, DBLP author-conference network, and Yahoo! Movies datasets, and the results show that the proposed method improves the classification accuracy and the clustering quality measure (NMI) over the best base model by up to 10%. Furthermore, it runs in time proportional to the number of instances, which is very efficient for large scale data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Barthelemy and B. Leclerc. The median procedure for partition. Partitoning Data Sets, AMS DIMACS Series in Discrete Math., 19:3--34, 1995.
 
2
3
 
4
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
 
5
6
7
 
8
 
9
K. Ganchev, J. Graca, J. Blitzer, and B. Taskar. Multi-view learning over structured and non-identical outputs. In Proc. of UAI' 08, pages 204--211, 2008.
10
 
11
A. Genkin, D. D. Lewis, and D. Madigan. Bbr: Bayesian logistic regression software. http://stat.rutgers.edu/~madigan/BBR/.
12
 
13
J. Hoeting, D. Madigan, A. Raftery, and C. Volinsky. Bayesian model averaging: a tutorial. Statist. Sci., 14:382--417, 1999.
 
14
T. Joachims. Transductive learning via spectral graph partitioning. In Proc. of ICML' 03, pages 290--297, 2003.
 
15
G. Karypis. Cluto - family of data clustering software tools. http://glaros.dtc.umn.edu/gkhome/views/cluto.
16
 
17
 
18
19
 
20
 
21
 
22
 
23
 
24
 
25
P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. Technical Report CS-TR-4905, University of Maryland, College Park, 2008.
 
26
 
27
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. SchOlkopf. Learning with local and global consistency. In Proc. of NIPS' 04, pages 321--328, 2004.
28
 
29
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

Collaborative Colleagues:
Jing Gao: colleagues
Wei Fan: colleagues
Yizhou Sun: colleagues
Jiawei Han: colleagues