Neural Collapse research seeks to advance mathematical understanding of deep learning

Led by Prof. Qing Qu, the project could influence the application of deep learning in areas such as machine learning, optimization, signal and image processing, and computer vision.
Qing Qu
Prof. Qing Qu

Prof. Qing Qu leads a new research project that aims to develop a rigorous mathematical theory that could explain a mysterious phenomenon known as “Neural Collapse,” which appears in deep learning models. These models, from AlexNet to large language models (LLMs), analyze data by recognizing complex patterns in images, text, sounds, and more to generate accurate insights and predictions.

“In the past decade, we have witnessed unprecedented success of deep learning across various domains in engineering and science,” Qu said. “However, the theoretical understanding of their success has remained elusive.

Recently, researchers discovered and characterized a mysterious—yet elegant—mathematical structure that appears within learned features and classifiers. This structure, known as Neural Collapse, appears across a variety of different network architectures, datasets, and data domains, and could describe the mathematical foundations of deep learning.

“We’re leveraging the symmetry of Neural Collapse to develop a rigorous mathematical theory to explain when and why it happens,” Qu said. “This simple mathematical structure of the last-layer classifiers and features can lead to profound insights on network training, generalization, and robustness.”

We’re leveraging the symmetry of Neural Collapse to develop a rigorous mathematical theory to explain when and why it happens.

Prof. Qing Qu

Qu plans to expand on this research to further understand the many layers of deep learning networks.

“Right now, we’re mainly focused on the final layer, but if we want to really understand how the deep learning functions, we need to understand what each layer of the network is doing,” Qu said. “Based upon our convincing preliminary investigations, we believe that each layer is performing feature compression and linear discrimination of the data. Moreover, we conjecture that such a phenomenon might also happen on LLMs and foundation models, and the study can have profound implications for understanding these models.”

The project, Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse, is funded by the National Science Foundation. It is done in collaboration with Dr. Yutong Wang, an ECE Research Fellow at U-M, as well as Prof. Zhihui Zhu at Ohio State University, and Prof. Jeremias Sulam at Johns Hopkins University.