ACM Home Page
Please provide us with feedback. Feedback
Efficient bandit algorithms for online multiclass prediction
Full text PdfPdf (482 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 440-447  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Sham M. Kakade  Toyota Technological Institute, Chicago, Illinois
Shai Shalev-Shwartz  Toyota Technological Institute, Chicago, Illinois
Ambuj Tewari  Toyota Technological Institute, Chicago, Illinois
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 46,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390212
What is a DOI?

ABSTRACT

This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multiclass bandit setting. The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback (referred to as "bandit" feedback, in the spirit of multi-armed bandit models) with respect to the true label (e.g. in many web applications users often only provide positive "click" feedback which does not necessarily fully disclose a true label). The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label). We provide (relative) mistake bounds which show how the Banditron enjoys favorable performance, and our experiments demonstrate the practicality of the algorithm. Furthermore, this paper pays close attention to the important special case when the data is linearly separable --- a problem which has been exhaustively studied in the full information setting yet is novel in the bandit setting.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual FOCS, 1998.
 
3
 
4
 
5
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
 
6
A. Elisseeff and J. Weston. A kernel method for multi-labeled classification. In Advances in Neural Information Processing Systems 14, 2001.
7
 
8
 
9
 
10
 
11
R. D. Kleinberg. Nearly tight bounds for the continuumarmed bandit problem. NIPS, 2004.
 
12
J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. NIPS, 2007.
 
13
 
14
 
15
 
16
V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
 
17
J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, April 1999.


Collaborative Colleagues:
Sham M. Kakade: colleagues
Shai Shalev-Shwartz: colleagues
Ambuj Tewari: colleagues