Tuesday, October 31, 2023

Physics meets machine learning and AI

Machine learning research of interest to physicists can be broadly divided into two categories: using machine learning tools to solve physics problems, and using ideas from physics to improve machine learning techniques.

An example of the former is the transformer neural networks used in the design of large language models such as ChatGPT. The ability of the transformer neural network architecture to efficiently learn long-ranged correlations in data is also useful for variational methods for finding ground states of strongly-correlated quantum many-body systems. Two papers demonstrating this approach were published in Physical Review B and Physical Review Letters earlier this year.

Popular image generation tools such as Dall-E and Stable Diffusion (which I wrote about previously) are based on time-reversing a diffusion process to generate desired samples from noise. This approach is heavily inspired by techniques from non-equilibrium statistical mechanics published in Physical Review E in 1997.

Another pressing issue in machine learning and AI is how to understand the emergent properties of large language models as their size or training time is scaled up. This is a problem that physicists are well-posed to tackle using techniques from statistical physics, random matrix theory, and the theory of phase transitions, which have recently been applied to shallow neural network models in a few different studies:

Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models

Learning through atypical phase transitions in overparameterized neural networks

Grokking phase transitions in learning local rules with gradient descent

Droplets of Good Representations: Grokking as a First Order Phase Transition in Two Layer Networks

I'm sure we'll see a growing number of theoretical physicists becoming involved in this exciting area of research in the coming years.



Tuesday, October 10, 2023

From graphene to borophene

In the beginning there was graphene, and graphene had some very remarkable properties which have attracted enormous interest over the years. For the unfamiliar, graphene is a two-dimensional sheet of carbon atoms arranged in a honeycomb lattice structure. The tight binding energy band structure has the peculiar property that its conduction and valence bands touch at the corners of the Brillouin zone. These corners are known as Dirac points because the low energy electronic degrees of freedom are governed by an effective Dirac equation,

$$ i \partial_t \psi = v_F (\boldsymbol{p} \cdot \boldsymbol{\hat{\sigma}} ) \psi,$$

where $\boldsymbol{\hat{\sigma}}$ are the Pauli matices, $\boldsymbol{p}$ is the in-plane momentum, and $v_F$ is the Fermi velocity, which acts as an effective speed of light.

 


The role of spin is in graphene's effective Dirac equation is played by the sublattice degree of freedom – the underlying honeycomb has two sublattices, termed A and B, that are inequivalent. This is known as a pseudospin. Because of this Dirac equation description, electrons in graphene can emulate a variety of interesting phenomena from high energy physics, such as the Klein paradox.

This interesting Dirac physics is not specific to graphene, but emerges in any periodic potential with a honeycomb lattice structure, including photonic systems. In this case, the electronic wavefunction is replaced by the optical field envelope, and the effective potential can be controlled by modulating the local refractive index. For example, in the case of semiconductor microcavities, the potential modulation can be created by selective etching of the cavity to form a honeycomb structure. The resulting photonic band structure can be observed experimentally by measuring the energy-resolved photoluminescence spectrum from the cavity, which reveals a neat Dirac cone structure where the two energy bands cross.



One of the interesting properties of a Dirac cone is that it has an emergent rotational symmetry. Even though the potential is inhomogeneous and breaks the continuous rotational symmetry, the energy eigenvalues in the vicinity of the Dirac cone are invariant under rotations. This in-plane rotational symmetry leads to a conserved total angular momentum J, which is the sum of the usual orbital angular momentum, and a pseudospin angular momentum associated with the sublattice degree of freedom. Rewriting the effective Dirac Hamiltonian in terms of pseudospin raising and lowering operators $\sigma_{\pm} = \hat{\sigma}_x \pm i \hat{\sigma}_y$,

$$ i \partial_t \psi = v_F ( e^{-i \varphi} \hat{\sigma}_+ +e^{i \varphi} \hat{\sigma_-} ) \psi,$$

we see that a flip of the pseudospin must be accompanied by a change in the orbital angular momentum (corresponding to the angular phase winding terms $\exp(\pm i \varphi)$. Therefore, if a Dirac cone is excited with a spin up state, one can measure a phase vortex in the spin down field component. This pseudospin-mediated vortex generation has been observed using photonic waveguide lattices.

Around 2011, multiple groups proposed various generalizations of graphene to higher-order conical intersections (reviewed here). The effective Hamiltonian at a higher order conical intersection can be obtained by replacing the spin ½ Pauli operators in the Dirac equation with spin s matrices. The resulting band structures similarly have intersecting conical bands with energies determined by the spin projection parallel to the momentum. 

 


There is a qualitative difference between intersections with integer and half-integer pseudospin. For the integer s case, the spin projection can vanish, corresponding to a flat band with zero energy for all momenta. The first studies of higher order conical intersections however were limited to tight binding models that seemed quite difficult to implement in practice, requiring for example laser-assisted hopping or fine-tuned multilayer structures.

Around the same time, different groups realized that the s=1 conical intersection could be observed using a relatively simple square lattice structure known as the Lieb lattice, which is obtained by removing one quarter of the sites from an ordinary square lattice. Starting from a tight binding model, one can show that the band structure close to the Brillouin zone corners is described by a spin-1 variant of the Dirac equation. This band structure is interesting because one has conical bands with a vanishing wave effective mass intersecting a flat band with an infinite wave effective mass.



One of the important consequences of the flat band is that waves in this band do not propagate, they remain localized. This nondiffracting property of flat band states was observed in 2015 by two groups (papers here and here). A second interesting observable difference of the Lieb lattice is that different pseudospin states have can have differing dynamics, dependent on the magnitude of the initial pseudospin. Initial states with pseudospin plus or minus one partially excite the flat band, leading to a splitting of a beam between a rapidly-expanding conical diffraction component, and a residual flat band component. In these conical diffraction experiments, it's also possible to observe a pseudospin-mediated generation of charge two phase vortices.

What about higher values of the pseudospin s? It gets more challenging. Usually we design conical intersections within the framework of a weak coupling (tight binding) approximation in which coupling between second and most distant neighbours is assumed to be zero. This requires a lattice that is sufficiently deep lattice or has a large separation between the sites. But minimizing second neighbour coupling in this manner also makes the nearest neighbour coupling weaker, reducing the overall energy bandwidth and making it more difficult to resolve the different bands at the conical intersection. This problem naturally gets worse the more intersecting bands one has. So while there have been many studies of Dirac cones and Lieb lattices, the extension of these ideas to higher pseudospin systems is more challenging. Feasible proposals are scarce and most have required complicated fine-tuned models that are difficult to implement.

One way to overcome these problems is to consider conical intersections that are protected by permutation symmetries. The idea is to associate a conical intersection with a permutation symmetry matrix. By re-interpreting the symmetry as an adjacency matrix of a graph, one can embed the degeneracy into a periodic lattice. Detuning the wavevector away from p=0 breaks the symmetry, lifting the degeneracy, which produces a conical intersection in the dispersion relation. This approach can be used to systematically create conical intersections of a desired order. More importantly, the resulting lattices typically involve symmetric and relatively close-packed structures, giving rise to larger bandwidths!

For the case of a five-fold degeneracy corresponding to pseudospin 2, the permutation symmetry approach generates a lattice known as chiral borophene. It can be obtained by considering as a unit cell a filled hexagon, removing one of the corners, and rotating the remaining sites. This gives rise to a lattice with broken mirror symmetry, which has two inequivalent chiral variants. The tight binding band structure has 5 intersecting bands at p=0, with a 6th band separated by a large gap.



The effective Hamiltonian describing the band structure close to $p=0$ is a little more complicated than the usual Dirac Hamiltonian,

$$ i \partial_t \psi = c_0 \boldsymbol{p} \cdot \boldsymbol{\hat{S}} + c_1 \boldsymbol{p} \cdot \{ \boldsymbol{\hat{S}}, \hat{S}_z^2 \} - c_2 \hat{1}, $$

with a second term proportional to an anticommutator of the spin-2 matrices. The reason for this is basically that the spin-2 matrices allow for more non-trivial terms that respect the rotational symmetry. The effect of this additional term is to control the relative opening angle between the pairs of conical bands. When the lattice is excited by a state with pseudospin 2, the conservation of total angular momentum means that phase vortices with charge up to 4 can be generated by post-selecting on different output pseudospin states, as shown in the simulation results published here.

The middle band in chiral borophene is not very flat, and actually has considerable dispersion over the Brillouin zone, even in the nearest neighbour tight binding model. Flat dispersion only occurs along high symmetry lines, which corresponds to the existence of non-diffracting line states. Similar to the case of the Lieb lattice, these nondiffracting states can be excited by considering an input with a staggered phase profile. The diffraction of such states is strongly suppressed compared to a similar input with a flat phase profile, as you can see in these experimental results.

The pseudospin-2 occurring in latices such as chiral borophene opens up many interesting properties for wave manipulation and nonlinear optics, including the possibility for cascaded wave mixing between the partially flat and conical bands, generation of high charge vortices, and the introduction of strain or other perturbations to open up topological band gaps, and possible analogies with the physics and propagation of gravitons (which also have spin 2). It will also be interesting to see whether this lattice can be realized as a two-dimensional electronic material.


Friday, October 6, 2023

IPS Meeting 2023

A few things I learned attending the first two days of this year's IPS Meeting, held right here at NUS:

Prof. Giovanni Vignale gave a plenary talk on bulk currents and edge accumulation in anomalous Hall systems. In the conventional quantum Hall effects, accumulation of charge at the edges of the sample are driven by the bulk quantized Hall conductivity. Anomalous quantum Hall systems, on the other hand, do not show an accumulation of spin or valley densities at their edges, despite their corresponding bulk spin or valley Hall conductivities being nonzero. In the case of spin Hall systems it's because bulk electrons will flip their spin when reflecting off the edge of the sample. Thus, the edges accumulate a nonzero charge density, but their spin density remains zero. Interestingly, a similar argument does not hold for the case of valley Hall systems because the applied electric field that drives the current also induces coupling between the valleys in the bulk. Further details can be found here.

The second plenary talk by Prof. Silvija Gradecak focused on the use of imperfect or novel materials to develop new components. A striking example given was the use of 2D materials as diffusion barriers in nanoscale metal contacts in integrated circuits, which promises the ability to further miniaturize electronic components.

Dr. Sen Mu talked about Kardar-Parisi-Zhang (KPZ) physics in the Anderson localization of two-dimensional wavepackets. The KPZ equation describes fluctuations that arise in the density fluctuations of expanding waves in the presence of disorder. These fluctuations are universal and arise in a variety of wave systems, including the spreading of coffee poured out onto a napkin, which he demonstrated for us live! arXiv preprint

Weitao Chen discussed critical dynamics in one-dimensional disordered systems with long range coupling. In critical systems the eigenstates exhibit multifractality, meaning that the different moments of the eigenstates scale with different non-integer exponents with the system size. This is a bit abstract and hard to measure directly in an experiment, but remarkably this multifractality can also be observed by exciting a single site of the lattice and measuring the time-dependent return probability! arXiv preprint

Prof. Di Zhu in another plenary surveyed integrated photonics for the generation, manipulation, and detection of quantum states of light. A recurring theme was that many of the improvements required to scale up integrated quantum photonic systems can be found by looking back to scientific literature from the 1960s! One neat example he gave was scaling up superconducting nanowire single photon detectors: Putting many of them one one chip is hard, because each coaxial microwave read-out line also conducts heat in - if you have too many you will no longer be able to keep the chip cool enough for the detectors to work. The solution? Move from detection based on a lumped circuit model to a transmission line detector, which can (with a bit of signal processing) perform spatially-resolved detection of multiple single photons. A demonstration of this idea was published this year in Physical Review Applied after spending quite some time under peer review by the looks of it.

There were many other interesting talks and posters that I didn't take enough notes on to write about, but it was nevertheless great to see the breadth of physics being done at the different universities and research institutes in Singapore.