|
ABSTRACT
We present novel semi-supervised boosting algorithms that incrementally build linear combinations of weak classifiers through generic functional gradient descent using both labeled and unlabeled training data. Our approach is based on extending information regularization framework to boosting, bearing loss functions that combine log loss on labeled data with the information-theoretic measures to encode unlabeled data. Even though the information-theoretic regularization terms make the optimization non-convex, we propose simple sequential gradient descent optimization algorithms, and obtain impressively improved results on synthetic, benchmark and real world tasks over supervised boosting algorithms which use the labeled data alone and a state-of-the-art semi-supervised boosting algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. Bertsekas. Nonlinear Programming, 2nd Edition, Athena Scientific, 1999.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
V. Castelli and T. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. on Information Theory, 42(6):2102--2117, 1996.
|
| |
7
|
|
| |
8
|
O. Chapelle, B. Scholköpf and A. Zien. Semi-Supervised Learning, MIT Press, 2006.
|
| |
9
|
K. Chen and S. Wang. Regularized boost for semi-supervised learning. Advances in Neural Information Processing Systems 20, 2007.
|
| |
10
|
I. Cohen and F. Cozman. Risks of semi-supervised learning. Semi-Supervised Learning, O. Chapelle, B. Scholköpf and A. Zien,55--70, MIT Press, 2006.
|
| |
11
|
|
| |
12
|
A. Corduneanu and T. Jaakkola. Data dependent regularization. Semi-Supervised Learning, O. Chapelle, B. Scholköpf and A. Zien, 163--182, MIT Press, 2006.
|
| |
13
|
|
| |
14
|
F. d'Alché-Buc, Y. Grandvalet and C. Ambroise. Semi-supervised marginBoost. Advances in Neural Information Processing Systems 14, 553--560, 2002.
|
| |
15
|
|
| |
16
|
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. The Thirteenth International Conference on Machine Learning, 148--156, 1996.
|
| |
17
|
|
| |
18
|
J. Friedman, T.Hastie and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337--407, 2000.
|
| |
19
|
Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. Advances in Neural Information Processing Systems, 17:529--536, 2004.
|
 |
20
|
Gholamreza Haffari , Yang Wang , Shaojun Wang , Greg Mori , Feng Jiao, Boosting with incomplete information, Proceedings of the 25th international conference on Machine learning, p.368-375, July 05-09, 2008, Helsinki, Finland
[doi> 10.1145/1390156.1390203]
|
| |
21
|
T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer, 2009.
|
 |
22
|
Jean-Christophe Janodet , Richard Nock , Marc Sebban , Henri-Maxime Suchier, Boosting grammatical inference with confidence oracles, Proceedings of the twenty-first international conference on Machine learning, p.54, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015336]
|
| |
23
|
Feng Jiao , Shaojun Wang , Chi-Hoon Lee , Russell Greiner , Dale Schuurmans, Semi-supervised conditional random fields for improved sequence segmentation and labeling, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, p.209-216, July 17-18, 2006, Sydney, Australia
[doi> 10.3115/1220175.1220202]
|
| |
24
|
G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. Advances in Neural Information Processing Systems 14, 447--454, 2002.
|
| |
25
|
C. Lee, S. Wang, F. Jiao, D. Schuurmans and R. Greiner. Learning to model spatial dependency: Semi-supervised discriminative random fields. Advances in Neural Information Processing, 19, 793--800, 2007.
|
| |
26
|
L. Mason, J. Baxter, P. Bartlett and M. Frean. Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Scholköpf and D. Schuurmans, editors, 221--246, MIT Press, 2000.
|
| |
27
|
|
| |
28
|
S. Roberts, R. Everson and I. Rezek. Maximum certainty data partitioning. Pattern Recognition, 33(5):833--839, 2000.
|
| |
29
|
|
| |
30
|
|
| |
31
|
Y. Wang, G. Haffari, S. Wang and G. Mori. Rate distortion based semi-supervised discriminative learning. Technical Report, 2009.
|
| |
32
|
D. Zhou, O. Bousquet, T. Navin Lal, J. Weston and B. Schölkopf. Learning with local and global consistency. Advances in Neural Information Processing Systems, 16:321--328, 2004.
|
| |
33
|
J. Zhu, S. Rosset, H. Zhou and T. Hastie. Multiclass AdaBoost. Technical Report, 2005.
|
| |
34
|
X. Zhu, Z. Ghahramani and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning, 912--919, 2003.
|
|