|
ABSTRACT
This paper deals with learnability of concept classes defined by neural networks, showing the hardness of PAC-learning (in the complexity, not merely information-theoretic sense) for networks with a particular class of activation. The obstruction lies not with the VC dimension, which is known to grow slowly; instead, the result follows the fact that the loading problem is NP-complete. (The complexity scales badly with input dimension; the loading problem is polynomial-time if the input dimension is constant.) Similar and well-known theorems had already been proved by Megiddo and by Blum and Rivest, for binary-threshold networks. It turns out the general problem for continuous sigmoidal-type functions, as used in practical applications involving steepest descent, is not NP-hard—there are “sigmoidals” for which the problem is in fact trivial—so it is an open question to determine what properties of the activation function cause difficulties. Ours is the first result on the hardness of loading networks which do not consist of binary neurons; we employ a piecewise-linear activation function that has been used in the neural network literature. Our theoretical results lend further justification to the use of incremental (architecture-changing) techniques for training networks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Albertini, F., E.D. Sontag, and V. Maillot, "Uniqueness of weights for neural networks," in Artificial Neural Networks for Speech and Vision (R. Mammone, ed.), Chapman and Hall, London, 1993, pp. 115-125.
|
| |
2
|
|
| |
3
|
Batruni, R., "A multilayer neural network with piecewise-linear structure and back-propagation learning," IEEE Transactions on Neural Networks 2(1991): 395-403.
|
| |
4
|
|
| |
5
|
|
| |
6
|
Brown, J., M. Garber, and S. Vanable, "Artificial neural network on a SIMD architecture," in Proc. 2nd Symposium on the Frontier of Massively Parallel Computation, Fairfax, VA, 1988, pp. 43-47.
|
 |
7
|
Christian Darken , Michael Donahue , Leonid Gurvits , Eduardo Sontag, Rate of approximation results motivated by robust neural network learning, Proceedings of the sixth annual conference on Computational learning theory, p.303-309, July 26-28, 1993, Santa Cruz, California, United States
[doi> 10.1145/168304.168357]
|
| |
8
|
|
| |
9
|
DasGupta, B., Siegelmann, H. T., and Sontag, E., "On the Complexity of Training Neural Networks with Continuous Activation Functions", Tech Report 4~ 93-61, Department of Computer Science, University of Minnesota, September, 1993.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
Jones, K.L., "A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training," Annals of Statistics, 20(1992): 608-613.
|
| |
14
|
|
| |
15
|
|
 |
16
|
M. Kearns , M. Li , L. Pitt , L. Valiant, On the learnability of Boolean formulae, Proceedings of the nineteenth annual ACM conference on Theory of computing, p.285-295, January 1987, New York, New York, United States
[doi> 10.1145/28395.28426]
|
| |
17
|
|
| |
18
|
Lippmann, R., "An introduction to computing with neural nets," IEEE Acoustics, Speech, and Signal Processing Magazine, 1987, pp. 4-22.
|
 |
19
|
|
 |
20
|
|
| |
21
|
Megiddo, M., "On the complexity of polyhedral separability," Discrete Computational Geometry, 3(1988): 325-337.
|
| |
22
|
Muroga, S., Threshold Logic and its Applications, John Wiley & Sons Inc., 1971.
|
| |
23
|
|
| |
24
|
|
| |
25
|
Sontag, E.D. and H.J. Sussmann, "Backpropagation can give rise to spurious local minima even for networks without hidden layers," Complex Systems 3(1989): 91-106.
|
| |
26
|
|
| |
27
|
|
| |
28
|
Zhang, X-D., "Complexity of neural network learning in the real number model," preprint, Comp. Sci. Dept., U. Mass., 1992.
|
|