|
ABSTRACT
Multi-view algorithms, such as co-training and co-EM, utilize unlabeled data when the available attributes can be split into independent and compatible subsets. Co-EM outperforms co-training for many problems, but it requires the underlying learner to estimate class probabilities, and to learn from probabilistically labeled data. Therefore, co-EM has so far only been studied with naive Bayesian learners. We cast linear classifiers into a probabilistic framework and develop a co-EM version of the Support Vector Machine. We conduct experiments on text classification problems and compare the family of semi-supervised support vector algorithms under different conditions, including violations of the assumptions underlying multi-view learning. For some problems, such as course web page classification, we observe the most accurate results reported so far.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145--1159.
|
| |
5
|
Brefeld, U., Geibel, P., & Wysotzki, F. (2003). Support vector machines with example dependent costs. Proceedings of the European Conference on Machine Learning.
|
| |
6
|
Collins, M., & Singer, Y. (1999). Unsupervised models for named entity classification. Proceedings of the Conference on Empirical Methods in Natural Language Processing.
|
| |
7
|
Cooper, D., & Freeman, J. (1970). On the asymptotic improvement in the outcome of supervised learning provided by additional nonsupervised learning. IEEE Transactions on Computers, C-19, 1055--1063.
|
| |
8
|
Cozman, F., Cohen, I., & Cirelo, M. (2003). Semi-supervised learning of mixture models. Proceedings of the International Conference on Machine Learning (pp. 99--106).
|
| |
9
|
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39.
|
| |
10
|
Denis, F., Laurent, A., Gilleron, R., & Tommasi, M. (2003). Text classification and co-training from positive and unlabeled examples. ICML Workshop on the Continuum from Labeled to Unlabeled Data.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Joachims, T. (2003). Transductive learning via spectral graph partitioning. Proceedings of the International Conference on Machine Learning.
|
| |
15
|
Kiritchenko, S., & Matwin, S. (2002). Email classification with co-training (Technical Report). University of Ottawa.
|
| |
16
|
Kockelkorn, M., Lüneburg, A., & Scheffer, T. (2003). Using transduction and multi-view learning to answer emails. Proceedings of the European Conference on Principle and Practice of Knowledge Discovery in Databases.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
Seeger, M. (2001). Learning with labeled and unlabeled data. (Technical Report). University of Edinburgh.
|
| |
25
|
Shahshahani, B., & Landgrebe, D. (1994). The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, 32, 1087--1095.
|
| |
26
|
|
CITED BY 5
|
|
|
|
|
Ulf Brefeld , Thomas Gärtner , Tobias Scheffer , Stefan Wrobel, Efficient co-regularised least squares regression, Proceedings of the 23rd international conference on Machine learning, p.137-144, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|