|
ABSTRACT
We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised CRF in this case.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Y. Altun, D. McAllester and M. Belkin. (2005). Maximum margin semi-supervised learning for structured variables. Advances in Neural Information Processing Systems 18.
|
 |
3
|
|
| |
4
|
|
| |
5
|
V. Castelli and T. Cover. (1996). The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. on Information Theory, 42(6):2102--2117.
|
| |
6
|
|
| |
7
|
I. Cohen and F. Cozman. (2006). Risks of semi-supervised learning. Semi-Supervised Learning, O. Chapelle, B. Scholköpf and A. Zien, (Editors), 55--70, MIT Press.
|
| |
8
|
A. Corduneanu and T. Jaakkola. (2006). Data dependent regularization. Semi-Supervised Learning, O. Chapelle, B. Scholköpf and A. Zien, (Editors), 163--182, MIT Press.
|
| |
9
|
|
| |
10
|
R. Duda and P. Hart. (1973). Pattern Classification and Scene Analysis, John Wiley & Sons.
|
| |
11
|
Y. Grandvalet and Y. Bengio. (2004). Semi-supervised learning by entropy minimization, Advances in Neural Information Processing Systems, 17:529--536.
|
| |
12
|
|
| |
13
|
W. Li and A. McCallum. (2005). Semi-supervised sequence modeling with syntactic topic models. Proceedings of Twentieth National Conference on Artificial Intelligence, 813--818.
|
| |
14
|
A. McCallum. (2002). MALLET: A machine learning for language toolkit. {http://mallet.cs.umass.edu}
|
| |
15
|
R. McDonald, K. Lerman and Y. Jin. (2005). Conditional random field biomedical entity tagger. {http://www.seas.upenn.edu/~sryantm/software/BioTagger/}
|
| |
16
|
R. McDonald and F. Pereira. (2005). Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics 2005, 6(Suppl 1):S6.
|
| |
17
|
|
| |
18
|
J. Nocedal and S. Wright. (2000). Numerical Optimization, Springer.
|
| |
19
|
S. Roberts, R. Everson and I. Rezek. (2000). Maximum certainty data partitioning. Pattern Recognition, 33(5):833--839.
|
 |
20
|
S. V. N. Vishwanathan , Nicol N. Schraudolph , Mark W. Schmidt , Kevin P. Murphy, Accelerated training of conditional random fields with stochastic gradient methods, Proceedings of the 23rd international conference on Machine learning, p.969-976, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143966]
|
| |
21
|
|
| |
22
|
D. Zhou, O. Bousquet, T. Navin Lal, J. Weston and B. Schölkopf. (2004). Learning with local and global consistency. Advances in Neural Information Processing Systems, 16:321--328.
|
 |
23
|
|
| |
24
|
X. Zhu, Z. Ghahramani and J. Lafferty. (2003). Semisupervised learning using Gaussian fields and harmonic functions. Proceedings of the 20th International Conference on Machine Learning, 912--919.
|
CITED BY 6
|
|
Gideon S. Mann , Andrew McCallum, Simple, robust, scalable semi-supervised learning via expectation regularization, Proceedings of the 24th international conference on Machine learning, p.593-600, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|