| Learning from labeled features using generalized expectation criteria |
| Full text |
Pdf
(269 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Singapore, Singapore
SESSION: Learning models for IR
table of contents
Pages 595-602
Year of Publication: 2008
ISBN:978-1-60558-164-4
|
|
Authors
|
|
Gregory Druck
|
University of Massachusetts, Amherst, MA, USA
|
|
Gideon Mann
|
Google, Inc., New York, NY, USA
|
|
Andrew McCallum
|
University of Massachusetts, Amherst, MA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 24, Downloads (12 Months): 190, Citation Count: 5
|
|
|
ABSTRACT
It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for training discriminative probabilistic models with labeled features and unlabeled instances. Unlike previous approaches that use labeled features to create labeled pseudo-instances, we use labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteria --- terms in a parameter estimation objective function that express preferences on values of a model expectation. In this paper we train multinomial logistic regression models using GE criteria, but the method we develop is applicable to other discriminative probabilistic models. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled features. Experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. For example, after only one minute of labeling features, we can achieve 80% accuracy on the ibm vs. mac text classification problem using GE-FL, whereas ten minutes labeling documents results in an accuracy of only 77%
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. Chang, L. Ratinov, and D. Roth. Guiding semi-supervision with constraint-driven learning. In ACL, 2007.
|
| |
3
|
|
 |
4
|
Aynur Dayanik , David D. Lewis , David Madigan , Vladimir Menkov , Alexander Genkin, Constructing informative prior distributions from domain knowledge in text classification, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148255]
|
| |
5
|
|
| |
6
|
Shantanu Godbole , Abhay Harpale , Sunita Sarawagi , Soumen Chakrabarti, Document classification through interactive supervision of document and term labels, Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, p.185-196, September 20-24, 2004, Pisa, Italy
|
| |
7
|
J. Graca, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In NIPS, 2007.
|
| |
8
|
Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In NIPS, 2004.
|
| |
9
|
|
 |
10
|
|
| |
11
|
R. Jin and Y. Liu. A framework for incorporating class priors into discriminative classification. In PAKDD, 2005.
|
| |
12
|
|
| |
13
|
D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In ICML, 1994.
|
| |
14
|
B. Liu, X. Li, W. Lee, and P. Yu. Text classification by labeling words. In AAAI, 2004.
|
 |
15
|
Gideon S. Mann , Andrew McCallum, Simple, robust, scalable semi-supervised learning via expectation regularization, Proceedings of the 24th international conference on Machine learning, p.593-600, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273571]
|
| |
16
|
A. McCallum, G. Mann, and G. Druck. Generalized expectation criteria. Technical Report 2007-62, University of Massachusetts, Amherst, 2007.
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.
|
CITED BY 5
|
|
|
|
|
Vikas Sindhwani , Prem Melville , Richard D. Lawrence, Uncertainty sampling and transductive experimental design for active dual supervision, Proceedings of the 26th Annual International Conference on Machine Learning, p.953-960, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Percy Liang , Michael I. Jordan , Dan Klein, Learning from measurements in exponential families, Proceedings of the 26th Annual International Conference on Machine Learning, p.641-648, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|