| Efficient bandit algorithms for online multiclass prediction |
| Full text |
Pdf
(482 KB)
|
| Source
|
ICML; Vol. 307
archive
Proceedings of the 25th international conference on Machine learning
table of contents
Helsinki, Finland
Pages 440-447
Year of Publication: 2008
ISBN:978-1-60558-205-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 46, Citation Count: 1
|
|
|
ABSTRACT
This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multiclass bandit setting. The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback (referred to as "bandit" feedback, in the spirit of multi-armed bandit models) with respect to the true label (e.g. in many web applications users often only provide positive "click" feedback which does not necessarily fully disclose a true label). The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label). We provide (relative) mistake bounds which show how the Banditron enjoys favorable performance, and our experiments demonstrate the practicality of the algorithm. Furthermore, this paper pays close attention to the important special case when the data is linearly separable --- a problem which has been exhaustively studied in the full information setting yet is novel in the bandit setting.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual FOCS, 1998.
|
| |
3
|
|
| |
4
|
|
| |
5
|
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
|
| |
6
|
A. Elisseeff and J. Weston. A kernel method for multi-labeled classification. In Advances in Neural Information Processing Systems 14, 2001.
|
 |
7
|
Michael Fink , Shai Shalev-Shwartz , Yoram Singer , Shimon Ullman, Online multiclass learning by interclass hypothesis sharing, Proceedings of the 23rd international conference on Machine learning, p.313-320, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143884]
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
R. D. Kleinberg. Nearly tight bounds for the continuumarmed bandit problem. NIPS, 2004.
|
| |
12
|
J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. NIPS, 2007.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
|
| |
17
|
J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, April 1999.
|
|