The Complexity of Learning Neural Networks
MetadataShow full item record
The empirical successes of neural networks currently lack rigorous theoretical explanation. A first step might be to show that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We demonstrate a surprisingly general lower bound. For inputs drawn from any logconcave distribution, we construct a family of functions that is hard to learn in the sense that any statistical query algorithm (including all known variants of stochastic gradient descent with any loss function, for any model architecture) needs an exponential number of queries even using tolerance inversely proportional to the input dimensionality. Moreover, this hard family of functions is realizable as neural networks using any activation function from a large family (including sigmoid and ReLU) with a small (sublinear in dimension) number of units in the single hidden layer, and whose output is a sum gate. Joint work with Le Song, Santosh Vempala, and Bo Xie.
- ARC Talks and Events