ACM Home Page
Please provide us with feedback. Feedback
The offset tree for learning with partial labels
Full text MovMov (23:20),  PdfPdf (476 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 129-138  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Alina Beygelzimer  IBM Research, Hawthorne, NY, USA
John Langford  Yahoo! Research, New York, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 41,   Downloads (12 Months): 97,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557040
What is a DOI?

ABSTRACT

We present an algorithm, called the Offset Tree, for learning to make decisions in situations where the payoff of only one choice is observed, rather than all choices. The algorithm reduces this setting to binary classification, allowing one to reuse any existing, fully supervised binary classification algorithm in this partial information setting. We show that the Offset Tree is an optimal reduction to binary classification. In particular, it has regret at most (k-1) times the regret of the binary classifier it uses (where k is the number of choices), and no reduction to binary classification can do better. This reduction is also computationally optimal, both at training and test time, requiring just O(log2 k) work to train on an example or make a prediction.

Experiments with the Offset Tree show that it generally performs better than several alternative approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. Abe, A. Biermann, and P. Long. Reinforcement learning with immediate rewards and linear hypotheses, Algorithmica, 37(4): 263--293, 2003.
 
2
 
3
4
 
5
 
6
 
7
 
8
J. Heckman. Sample selection bias as a specification error, Econometrica, 47(1): 153--161, 1979.
 
9
M. Kearns, Y. Mansour, and A. Y. Ng. Approximate planning in large POMDPs via reusable trajectories, Advances in Neural Information Processing Systems (NIPS), 12, 2000.
 
10
S. Kulkarni. On bandit problems with side observations and learnability, Proceedings of the 31st Allerton Conference on Communication, Control, and Computing, 83--92, 1993.
 
11
 
12
J. Langford and A. Beygelzimer. http://hunch.net/?p=343.
 
13
J. Langford and T. Zhang. The Epoch-greedy algorithm for contextual multiarmed bandits, Advances in Neural Information Processing Systems (NIPS), 2007.
 
14
S. Pandey, D. Agarwal, D. Chakrabati, V. Josifovski. Bandits for taxonomies: a model based approach, Proceedings of the 7th SIAM International Conference on Data Mining (SIAM SDM), 2007.
 
15
H. Robbins. Some aspects of the sequential design of experiments, Bulletins of the American Mathematical Society, 58: 527--535, 1952.
16
 
17
C. Blake and C. Merz. UCI Repository of machine learning databases. University of California, Irvine.
 
18
C. C. Wang, S. Kulkarni, and H. Vincent Poor. Bandit problems with side observations, IEEE Transactions on Automatic Control, 50(5), 2005.
 
19
 
20
M. Woodru . A one-armed bandit problem with concomitant variates, Journal of the American Statistical Association, 74 (368): 799--806, 1979.
 
21
B. Zadrozny. Ph.D. Thesis, University of California, San Diego, 2003.
 
22

Collaborative Colleagues:
Alina Beygelzimer: colleagues
John Langford: colleagues