Machine learning research of interest to physicists can be broadly divided into two categories: using machine learning tools to solve physics problems, and using ideas from physics to improve machine learning techniques.
An example of the former is the transformer neural networks used in the design of large language models such as ChatGPT. The ability of the transformer neural network architecture to efficiently learn long-ranged correlations in data is also useful for variational methods for finding ground states of strongly-correlated quantum many-body systems. Two papers demonstrating this approach were published in Physical Review B and Physical Review Letters earlier this year.
Popular image generation tools such as Dall-E and Stable Diffusion (which I wrote about previously) are based on time-reversing a diffusion process to generate desired samples from noise. This approach is heavily inspired by techniques from non-equilibrium statistical mechanics published in Physical Review E in 1997.
Another pressing issue in machine learning and AI is how to understand the emergent properties of large language models as their size or training time is scaled up. This is a problem that physicists are well-posed to tackle using techniques from statistical physics, random matrix theory, and the theory of phase transitions, which have recently been applied to shallow neural network models in a few different studies:
Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models
Learning through atypical phase transitions in overparameterized neural networksGrokking phase transitions in learning local rules with gradient descent
Droplets of Good Representations: Grokking as a First Order Phase Transition in Two Layer Networks
I'm sure we'll see a growing number of theoretical physicists becoming involved in this exciting area of research in the coming years.
No comments:
Post a Comment