Loading Events
  • This event has passed.
Columbia Data Science Institute

Data Science Institute Colloquium Series Event: What Can Deep Learning Learn from Linear Regression

OPEN TO ALL

ABSTRACT

When training large-scale deep neural networks for pattern recognition, hundreds of hours on clusters of GPUs are required to achieve state-of-the-art performance. Improved optimization algorithms could potentially enable faster industrial prototyping and make training contemporary models more accessible.

In this talk, I will attempt to distill the key difficulties in optimizing large, deep neural networks for pattern recognition. In particular, I will emphasize that many of the popularized notions of what make these problems “hard” are not true impediments at all. I will show that it is not only easy to globally optimize neural networks, but that such global optimization remains easy when fitting completely random data.

I will argue instead that the source of difficulty in deep learning is a lack of understanding of generalization—namely understanding behavior on new and unseen data. By appealing to standard concepts from linear regression, I will describe why certain popular theories of generalization fail to explain the success of large neural nets. I will close with some possible approaches to patching this theory and guiding the engineering of deep learning models with enormous capacity.