Daniel Leykam's research blog: Learning Diffusion

Techniques for efficiently evaluating derivatives of functions underlie numerous large-scale applications of machine learning algorithms including computer vision and natural language processing. Machine learning algorithms are trained by minimizing a cost function, typically using some form of gradient descent. For example, gradients of the cost function of artificial neural networks can be efficiently computed using backpropagation, discovered (and re-discovered) since the 1960s.

Backpropagation is a special case of automatic differentiation, which can be applied to any function specified by a computer program, including those involving loops, conditional statements, and recursion. Reverse-mode automatic differentiation is highly efficient when the function of interest has a small number of outputs (dependent variables) and many inputs (independent variables).

Automatic differentiation is particularly promising for variational problems in physics, where one seeks a configuration in a high-dimensional phase space that minimizes the total energy. In the past, such problems would be solved by using hard-won physical insight to come up with a simple trial function (such as a Gaussian with a variable width and centre), evaluating the cost function gradients by hand, and then numerically performing the gradient descent. Nowadays this procedure could be replaced by a deep neural network, making the physicist's job a lot easier (or redundant for this kind of problem?).

Another important class of machine learning models are generative models, which are trained to generate samples from some unknown or complicated probability distribution. This becomes challenging when dealing with very high-dimensional spaces, for example in problems such text-to-image conversion. Generators based on diffusion models have attracted enormous interest this year, sparked by the success of DALL-E 2 and subsequent open source alternatives including stable diffusion, which is quite awesome. You can play with online here, or even install it on your own computer!

The seminal paper introducing diffusion models was directly inspired by the physics of diffusion. The approach taken by these models is to gradually corrupt the training data into a simple-to-model distribution (such as a uniform or gaussian distribution) via diffusion, and then train a neural network to reverse the process and reproduce the original data, as shown below (image taken from Fig. 1 of the paper). After the neural network is trained, new samples drawn from the easy-to-model distribution can be rapidly transformed to the learned distribution by reversing the diffusion. In effect, the problem of estimating and generating samples from the underlying probability distribution is reduced to fitting a function to carry out the reverse diffusion process.

It is interesting to speculate on possible applications of these cutting-edge machine learning techniques to physics. For example, in the quantum computing context there is the problem of sampling measurement outcomes from a specified (noisy) quantum circuit.

Daniel Leykam's research blog

Friday, October 28, 2022

Learning Diffusion

No comments:

Post a Comment