Blog

Principles of Deep Learning Theory

September 10, 2021

Over the long pandemic lockdown, I taught some folks Python, Docker, and Kubernetes. I learned some banjo. Lost some weight. Baked a lot of sourdough bread. Joined TetraScience and started learning how to build analytics applets with Streamlit. What did you do?

Dan Roberts, Sho Yaida, and Boris Hanin wrote a textbook called Principles of Deep Learning Theory (due out from Cambridge University Press in 2022, but available in .pdf form at the link) that rigorously explains how deep neural networks (DNNs) work from first principles. Along the way, the collaborators show how DNN aspect ratios (width vs. depth) govern their performance, and advance a theoretical framework and new abstractions for categorizing DNNs, predicting their information flow and representational capacity. They use this framework to prescribe methods for extending DNNs to arbitrary depths and solving challenges like the vanishing gradient problem.

Yaida explains this very elegantly in a Facebook AI Research blog, published in mid-June. To summarize: DNNs are now being used intensively for many kinds of practical work. They're of great interest to biopharma research, where they're being applied in areas as pragmatic as predicting when equipment failures will occur (thus enabling continuous manufacturing with near-zero waste), and as open-ended as searching for hits and optimizing leads across vast virtual databases of candidate small molecules -- doing the fundamental, effortful work of identifying therapeutics of likely value automatically, consuming no human or physical resources. On the bleeding edge of research, meanwhile, machine learning is pushing research beyond the (probably structural) limits of human brainpower and intuition. The well-known AlphaFold project, for example, is advancing use of deep learning to rapidly and accurately predict 3D protein structure from sequence data.

But training DNNs effectively is still largely black art because their behavior differs from that of abstract theoretical models, and can thus be time-consuming and inefficient. And validating the competence of trained models with precision has been effectively impossible because (before Roberts et. al.'s pandemic project, anyway) there was no way to rederive the function a DNN is computing from its trained behavior, or predict it in detail from knowledge of the network's initial state and training arc.

Now this is no longer quite so impossible, or so mathematicians and physicists reviewing the book are hoping. To quote from Yaida's blog: “The book presents an appealing approach to machine learning based on expansions familiar in theoretical physics," said Eva Silverstein, a Professor of Physics at Stanford University. "It will be exciting to see how far these methods go in understanding and improving AI." Yaida's own assertion, central to the blog, is that this work potentially opens the door to what may be a period of rapid, systematic improvements in DNN technology, based on firm principles of physics and math, rather than trial and error. It may also (or so we hope) influence improvements to DNN toolkits that make it easier for scientists with deep understanding of biological problems to design, train, and apply DNNs to help find solutions.

Obviously, we're all going to need to read the book as our next pandemic activity.