ACM Home Page
Please provide us with feedback. Feedback
Approximation algorithms for budgeted learning problems
Full text PdfPdf (296 KB)
Source
Annual ACM Symposium on Theory of Computing archive
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing table of contents
San Diego, California, USA
SESSION: Session 3A table of contents
Pages: 104 - 113  
Year of Publication: 2007
ISBN:978-1-59593-631-8
Authors
Sudipto Guha  University of Pennsylvania, Philadelphia, PA
Kamesh Munagala  Duke University, Durham, NC
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 123,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1250790.1250807
What is a DOI?

ABSTRACT

We present the first approximation algorithms for a large class of budgeted learning problems. One classicexample of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandithas an unknown reward distribution on which a prior isspecified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase,the arm with the highest (posterior) expected reward is hosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon discounted reward setting, the budgeted version of the problem is NP-Hard. For this problem and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
[1] Q. An, H. Li, X. Liao, and L. Carin. Active feature acquisition with POMDP models. Submitted to Pattern Recognition Letters, 2006.
 
2
 
3
[3] D. Bertsimas, D. Gamarnik, and J. Tsitsiklis. Performance of multiclass markovian queueing networks via piecewise linear Lyapunov functions. Annals of Applied Probability, 11(4):1384-1428, 2002.
 
4
 
5
 
6
[6] D. Bertsimas, I. Paschalidis, and J. N. Tsitsiklis. Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance. Annals of Applied Probability, 4(1):43-75, 1994.
 
7
[7] M. Charikar, C. Chekuri, and M. Pál. Sampling bounds for stochastic optimization. In APPROX-RANDOM, pages 257-269, 2005.
 
8
[8] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 705-712. The MIT Press, 1995.
 
9
[9] V. Conitzer and T. Sandholm. Definition and complexity of some basic metareasoning problems. In IJCAI, pages 1099-1106, 2003.
 
10
 
11
 
12
 
13
[13] J. C. Gittins and D. M. Jones. A dynamic allocation index for the sequential design of experiments. Progress in statistics (European Meeting of Statisticians), 1972.
14
 
15
 
16
[16] S. Guha and K. Munagala. Model driven optimization using adaptive probes. Proc. ACM-SIAM Symp. on Discrete Algorithms, 2007.
 
17
[17] S. Guha and K. Munagala. Approximation Algorithms for Budgeted Learning Problems. Available at http://www.cs.duke.edu/~kamesh/bandits.pdf.
 
18
[18] S. Guha, K. Munagala, and S. Sarkar. Jointly optimal probing and transmission strategies for multi-channel wireless systems. Submitted to IEEE Transactions on Information Theory, 2007. Available at http://www.cs.duke.edu/~kamesh/partialinfo.pdf.
 
19
 
20
[20] M. J. Kearns, Y. Mansour, and A. Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In NIPS, pages 1001-1007, 1999.
 
21
 
22
 
23
[23] A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. Proc. 21st Conf. on Uncertainty in Artificial Intelligence, 2005.
 
24
 
25
26
 
27
 
28
 
29
[29] J. Schneider and A. Moore. Active learning in discrete input spaces. The 34th Interface Symposium, Montreal, Quebec, 2002.
 
30
 
31
 
32
[32] J. N. Tsitsiklis. A short proof of the Gittins index theorem. Annals of Applied Probability, 4(1):194-199, 1994.
 
33
[33] P. Whittle. Restless bandits: activity allocation in a changing world. Journal of applied probability, 25(A):287-298, 1988.


Collaborative Colleagues:
Sudipto Guha: colleagues
Kamesh Munagala: colleagues