|
ABSTRACT
This paper presents a boosting based algorithm for learning a bipartite ranking function (BRF) with partially labeled data. Until now different attempts had been made to build a BRF in a transductive setting, in which the test points are given to the methods in advance as unlabeled data. The proposed approach is a semi-supervised inductive ranking algorithm which, as opposed to transductive algorithms, is able to infer an ordering on new examples that were not used for its training. We evaluate our approach using the TREC-9 Ohsumed and the Reuters-21578 data collections, comparing against two semi-supervised classification algorithms for ROCArea (AUC), uninterpolated average precision (AUP), mean precision@50 (TP) and Precision-Recall (PR) curves. In the most interesting cases where there are an unbalanced number of irrelevant examples over relevant ones, we show our method to produce statistically significant improvements with respect to these ranking measures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
M.-R. Amini and P. Gallinari. Semi-Supervised Learning with Explicit Misclassification Modeling. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 555--560, 2003.
|
| |
4
|
P. L. Bartlett, M. I. Jordan and Jon D. McAuliffe. Large Margin Classifiers: convex Loss, Low Noise and Convergence Rates. In Advances in Neural Information Processing Systems 16, pages 1173--1180, 2004.
|
 |
5
|
|
| |
6
|
A.P. Bradley. The use of the Area under the ROC curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30:1145--1159, 1997.
|
 |
7
|
|
| |
8
|
O. Chapelle, B. Schölkopf and A. Zien. Semi-Supervised Learning, MIT Press, Cambridge, MA, 2006
|
| |
9
|
C. Cortes and M. Mohri. AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems 16, pages 313--320, 2004.
|
| |
10
|
|
| |
11
|
E. Gaussier and C. Goutte. Learning with Partially Labelled Data - with Confidence. In ICML'05 Workshop on Learning from Partially Classified Training Data (ICML'05-LPCT), pages 29--36, 2005.
|
| |
12
|
William Hersh , Chris Buckley , T. J. Leone , David Hickam, OHSUMED: an interactive retrieval evaluation and new large test collection for research, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, p.192-201, July 03-06, 1994, Dublin, Ireland
|
| |
13
|
|
| |
14
|
E.L. Lehmann. Nonparametric Statistical Methods Based on Ranks. McGraw-Hill, New York, 1975.
|
| |
15
|
D. D. Lewis. Reuters-21578, distribution 1.0 http://www.daviddlewis.com/resources/testcollections/reuters21578/. January 1997.
|
| |
16
|
|
| |
17
|
S. Robertson and D.A. Hull. The TREC-9 Filtering Track Final Report. In Proceedings of the 9th Text REtrieval Conference (TREC-9). pages 25--40, 2001.
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
J. Weston, R. Kuang, C. Leslie and W.S. Noble. Protein ranking by semi-supervised network propagation. BMC Bioinformatics, special issue, 2006.
|
| |
23
|
D. Zhou, J. Weston, A. Gretton, O. Bousquet and B. Schölkopf. Ranking on Data Manifolds. In Advances in Neural Information Processing Systems 16, pages 169--176, 2004.
|
 |
24
|
|
|