Thursday, July 8, 2021

ILJU POSTECH MINDS Workshop on Topological Data Analysis and Machine Learning

As a newcomer to the field of topological data analysis (TDA), I found the ILJU POSTECH MINDS Workshop on Topological Data Analysis and Machine Learning held this week to be highly informative.

I was not able to attend all the talks live due to other commitments and will miss the last day, but luckily the recorded talks are available for viewing later. Here are summaries of the talks I saw so far:

Paul Rosen (University of South Florida) talked about analyzing and visualizing graphs using TDA. First, TDA can provide optimized initial layouts for spring-mass visualization of graphs, allowing features such as clusters and cycles to be more easily resolved. Second, persistent homology combined with dimensional reduction (manifold learning) can be used to detect events or anomalies in time varying graphs. As an example, he considered the time-varying graph describing email interactions between members of a large research institute. Finally, he showed how Mapper can be used to generate simplified representations of large complex graphs.

Gary Shiu (University of Wisconsin) presented applications of TDA to some problems in physics, including quantifying the structure of the string landscape, measuring the large scale structure of the universe, and identifying phase transitions in spin models. For large datasets in Euclidean space, it's much more efficient to use the alpha complex (which has polynomial scaling with the number of data points) than the Vietoris-Rips complex (which scales exponentially). Persistent homology is useful because it can directly identify nonlocal physical features of interest, such as vortices and filaments.

Bei Wang (University of Utah) gave a talk on how TDA can be used to understand the behaviour of deep neural networks. Focusing on popular neural networks used for image classification, TDA can be used to analyze the shapes of the activation vectors at each layer and construct effective decision trees, showing for example how the neural networks distinguish deer from horses by the presence of antlers. This is clever step towards making deep neural network models more interpretable.

Moo K. Chung (University of Wisconsin - Madison) has used persistent homology to distinguish different configurations of the covid-19 virus's spike proteins. He discussed how the large size of the molecule and large number of topological features requires methods to simplify the information encoded in the persistence diagrams. One method - persistence images - based on coarse-graining of the persistence diagram, is unsuitable because it omits important information carried by a few of the features. It is preferable to use alternate methods such as persistence landscapes, which retain all the information while converting it into a vector space form, enabling different persistence diagrams to be systematically compared or fed into machine learning algorithms.

Naoya Tanabe (Kyoto University Hospital) talked about how persistent homology can be used for detection of lung diseases in 3D CT images. The usual manual approach is based on expert analysis of 2D lung images. Deep neural network-based machine learning can be used to carry out supervised classification of these images, but it requires large sets of training data and underlying model lacks interpretability. Persistent homology is a promising alternative, as it is able to directly recognise the relevant local image features used to identify the diseases of interest, namely localized clusters and voids in the 3D intensity images. This allows accurate automatic diagnosis using small training sets and simple rule-based decision trees.

Kelin Xia (Nanyang Technological University) provided a survey of his work on the use of TDA to characterize the shapes of complex molecules, with applications to drug design. He emphasized that in the case of point clouds constructed from positions of the atoms comprising the molecule, every feature birth and death scale is meaningful, corresponding to characteristics such as bond lengths and ring sizes. Feature sizes are typically divided into two groups, corresponding to short and long range features of the molecule. The former provides a fingerprint of which atoms and bonds are present in the molecule. Changes in the long range features are sensitive to chemical reaction dynamics. The multiscale information captured by persistent homology forms ideal features for use in machine learning-based drug design and discovery. He also mentioned recent work on generalizing persistent homology to hypergraphs describing more complex many-body interactions, which is relevant to collaboration networks.

No comments:

Post a Comment