| The offset tree for learning with partial labels |
| Full text |
Mov
(23:20),
Pdf
(476 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 129-138
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 41, Downloads (12 Months): 97, Citation Count: 0
|
|
|
ABSTRACT
We present an algorithm, called the Offset Tree, for learning to make decisions in situations where the payoff of only one choice is observed, rather than all choices. The algorithm reduces this setting to binary classification, allowing one to reuse any existing, fully supervised binary classification algorithm in this partial information setting. We show that the Offset Tree is an optimal reduction to binary classification. In particular, it has regret at most (k-1) times the regret of the binary classifier it uses (where k is the number of choices), and no reduction to binary classification can do better. This reduction is also computationally optimal, both at training and test time, requiring just O(log2 k) work to train on an example or make a prediction. Experiments with the Offset Tree show that it generally performs better than several alternative approaches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
N. Abe, A. Biermann, and P. Long. Reinforcement learning with immediate rewards and linear hypotheses, Algorithmica, 37(4): 263--293, 2003.
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
J. Heckman. Sample selection bias as a specification error, Econometrica, 47(1): 153--161, 1979.
|
| |
9
|
M. Kearns, Y. Mansour, and A. Y. Ng. Approximate planning in large POMDPs via reusable trajectories, Advances in Neural Information Processing Systems (NIPS), 12, 2000.
|
| |
10
|
S. Kulkarni. On bandit problems with side observations and learnability, Proceedings of the 31st Allerton Conference on Communication, Control, and Computing, 83--92, 1993.
|
| |
11
|
|
| |
12
|
J. Langford and A. Beygelzimer. http://hunch.net/?p=343.
|
| |
13
|
J. Langford and T. Zhang. The Epoch-greedy algorithm for contextual multiarmed bandits, Advances in Neural Information Processing Systems (NIPS), 2007.
|
| |
14
|
S. Pandey, D. Agarwal, D. Chakrabati, V. Josifovski. Bandits for taxonomies: a model based approach, Proceedings of the 7th SIAM International Conference on Data Mining (SIAM SDM), 2007.
|
| |
15
|
H. Robbins. Some aspects of the sequential design of experiments, Bulletins of the American Mathematical Society, 58: 527--535, 1952.
|
 |
16
|
Alexander L. Strehl , Chris Mesterharm , Michael L. Littman , Haym Hirsh, Experience-efficient learning in associative bandit problems, Proceedings of the 23rd international conference on Machine learning, p.889-896, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143956]
|
| |
17
|
C. Blake and C. Merz. UCI Repository of machine learning databases. University of California, Irvine.
|
| |
18
|
C. C. Wang, S. Kulkarni, and H. Vincent Poor. Bandit problems with side observations, IEEE Transactions on Automatic Control, 50(5), 2005.
|
| |
19
|
|
| |
20
|
M. Woodru . A one-armed bandit problem with concomitant variates, Journal of the American Statistical Association, 74 (368): 799--806, 1979.
|
| |
21
|
B. Zadrozny. Ph.D. Thesis, University of California, San Diego, 2003.
|
| |
22
|
|
|