ACM Home Page
Please provide us with feedback. Feedback
Curriculum learning
Full text PdfPdf (1.97 MB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 41-48  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Yoshua Bengio  U. Montreal, Montreal, Canada
Jérôme Louradour  U. Montreal, Montreal, Canada and A2iA SA, Paris, France
Ronan Collobert  NEC Laboratories America, Princeton, NJ
Jason Weston  NEC Laboratories America, Princeton, NJ
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 52,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553380
What is a DOI?

ABSTRACT

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bengio, Y. (2009). Learning deep architectures for AI. Foundations & Trends in Mach. Learn., to appear.
 
3
Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Adv. Neural Inf. Proc. Sys. 13 (pp. 932--938).
 
4
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Adv. Neural Inf. Proc. Sys. 19 (pp. 153--160).
 
5
Cohn, D., Ghahramani, Z., & Jordan, M. (1995). Active learning with statistical models. Adv. Neural Inf. Proc. Sys. 7 (pp. 705--712).
 
6
7
 
8
Derényi, I., Geszti, T., & Gyöörgyi, G. (1994). Generalization in the programed teaching of a perceptron. Physical Review E, 50, 3192--3200.
 
9
Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 781--799.
 
10
Erhan, D., Manzagol, P.-A., Bengio, Y., Bengio, S., & Vincent, P. (2009). The difficulty of training deep architectures and the effect of unsupervised pre-training. AI & Stat. '2009.
 
11
 
12
Håstad, J., & Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1, 113--129.
 
13
 
14
Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.
 
15
Krueger, K. A., & Dayan, P. (2009). Flexible shaping: how learning in small steps helps. Cognition, 110, 380--394.
16
 
17
Peterson, G. B. (2004). A day of great illumination: B. F. Skinner's discovery of shaping. Journal of the Experimental Analysis of Behavior, 82, 317--328.
 
18
Ranzato, M., Boureau, Y., & LeCun, Y. (2008). Sparse feature learning for deep belief networks. Adv. Neural Inf. Proc. Sys. 20 (pp. 1185--1192).
 
19
Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. Adv. Neural Inf. Proc. Sys. 19 (pp. 1137--1144).
 
20
Rohde, D., & Plaut, D. (1999). Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition, 72, 67--109.
 
21
Salakhutdinov, R., & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI & Stat. '2007.
 
22
Salakhutdinov, R., & Hinton, G. (2008). Using Deep Belief Nets to learn covariance kernels for Gaussian processes. Adv. Neural Inf. Proc. Sys. 20 (pp. 1249--1256).
23
 
24
Sanger, T. D. (1994). Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Trans. on Robotics and Automation, 10.
 
25
Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech and Signal Processing (pp. 765--768). Orlando, Florida.
 
26
Skinner, B. F. (1958). Reinforcement today. American Psychologist, 13, 94--99.
 
27
28
29
 
30

Collaborative Colleagues:
Yoshua Bengio: colleagues
Jérôme Louradour: colleagues
Ronan Collobert: colleagues
Jason Weston: colleagues