### Organizers

Topological and geometric ideas have already shown promises in producing novel perspectives and powerful algorithms for analyzing complex and diverse data. As the theory and foundations of topological data analysis continue to mature, we are presented with great opportunities to consolidate existing synergy as well as to establish new connections and collaborations among computational scientists, mathematicians, and statisticians, so as to form new perspectives and develop novel methodologies / algorithms for modern data analysis. This workshop, organized by the TGDA group at OSU, presents a timely platform to help achieve these goals.

Current PhD students and young researchers within 5 years of the PhD degree are qualified to apply for partial travel / accommodation support to attend the TGDA workshop.

There is also a one week summer school program that preceeds the workshop.

#### Advisory Committee

Larry A. Wasserman, Statistics, CMU

Peter Bubenik, Mathematics, U. Florida

#### Travel and Lodging Info

**Lodging**- There are several great hotel and lodging options available near the Ohio State University Campus. For a full list of options and more information go here. The MBI is located in Jennings Hall at 1735 Neil Avenue on the 3rd floor and most OSU hotels should offer shuttle transportation to get you to and from the MBI on campus for the workshop each day.**Airport**- When you arrive at John Glenn Columbus International airport-CMH you can take a taxi to your hotel (or find your hotel shuttle if offered) by going to the ground transportation area of the terminal where they offer 24-hour Airport taxi service.**Driving to MBI & Campus Parking**- If you are driving to the workshop, the closest public parking garage near the MBI is the 12th Avenue Garage. The MBI is just a short walk east from here on 12th Ave. to the Intersection of Neil Ave. where we are located in Jennings Hall on the 3rd floor. Here is a Google walking map.

We very gratefully acknowledge funding and support from our NSF TRIPODS grant, our NSF RTG grant, The Ohio State Math Research Institute MRI and the Ohio State Mathematical Biosciences Institute MBI.

### Accepted Speakers

Monday, May 21, 2018 | |
---|---|

Time | Session |

09:15 AM 09:30 AM | Opening remarks |

09:30 AM 10:30 AM | Mariel Vazquez - Modeling local reconnection events along DNA molecules In Escherichia coli DNA replication yields two interlinked circular chromosomes. Returning the chromosomes to an unlinked monomeric state is essential to cell survival. Simplification of DNA topology is mediated by enzymes, such as recombinases and topoisomerases. Recombinases act by a local reconnection event. We investigate analytically minimal pathways of unlinking by local reconnection. We introduce a Monte Carlo method to simulate local reconnection, provide a quantitative measure to distinguish among pathways and identify a most probable pathway. These results point to a universal property relevant to any local reconnection event between two sites along one or two circles. |

10:30 AM 11:00 AM | Break |

11:00 AM 12:00 PM | Ulrich Bauer - The Reeb Graph Edit Distance is Universal We consider the setting of Reeb graphs of piecewise linear functions and study distances between them that are stable, meaning that functions which are similar in the supremum norm ought to have similar Reeb graphs. We define an edit distance for Reeb graphs and prove that it is stable and universal, meaning that it provides an upper bound to any other stable distance. In contrast, via a specific construction, we show that the interleaving distance and the functional distortion distance on Reeb graphs are not universal. (Joint work with Claudia Landi and Facundo Mémoli) |

12:00 PM 02:00 PM | Lunch Break |

02:00 PM 03:00 PM | Paul Bendich - Topology and Geometry for Vehicle-tracking Systems Many systems employ sensors to interpret the environment. The target-tracking task is to gather sensor data from the environment and then to partition these data into tracks that are produced by the same target. A key challenge is to "connect-the-dots": more precisely, to take a sensor observation at a given time and associate it with a previously-existing track (or to declare that this is a new object). This can be very challenging, especially when there are multiple targets present, and when different sensors "go blind" at various times. There are a number of high-level designs for multi-target multi-sensor tracking systems, but many contemporary ones fit within the multiple hypothesis (MHT) paradigm. Typical MHTs formulate the 'connect-the-dots' problem as one of Bayesian inference, with competing multi-track hypotheses receiving scores. This talk surveys three recent efforts to integrate topological and geometric information into the formula for computing hypothesis scores. The first uses zero-dimensional persistent homology summaries of kinematic information in car tracking, and the second uses persistent homology summaries in state space to form grouping hypotheses for nautical traffic. Finally, a method using self-similarity matrices is employed to make useful cross-modal comparisons in heterogeneous sensor networks. In all three efforts, large improvements in MHT performance are observed. This talk is based on work funded by OSD, NRO, AFRL, and NSF, and it was done with many collaborators, including Nathan Borggren, Sang Chin, Jesse Clarke, Jonathan deSena, John Harer, Jay Hineman, Elizabeth Munch, Andrew Newman, Alex Pieloch, David Porter, David Rouse, Nate Strawn, Christopher J. Tralie, Adam Watkins, Michael Williams, and Peter Zulch. |

03:00 PM 03:30 PM | Break |

03:30 PM 04:30 PM | Laxmi Parida - Combinatorics, Topology and Genomics Many problems in genomics are combinatorial in nature, i.e., the interrelationship amongst the entities is at par, if not more important, than the value of the entity themselves. Graphs are the most commonly used mathematical object to model such relationships. However, often it is important to capture not just pairwise, but more general $k$-wise relationships. TDA provides a natural basis to model such interactions and, additionally, it provides mechanisms for extracting signal patterns from noisy data. I will discuss two applications of this model, one to population genomics and the other to meta-genomics. In the former, we model the problem of detecting admixture in populations and in the latter we deal with the problem of detecting false positives in microbiome data. The unifying theme is the use of topological methods -- in particular, persistent homology. I will explain the underlying mathematical models and describe the results obtained in both cases. |

04:30 PM 07:00 PM | Poster session and reception |

Tuesday, May 22, 2018 | |
---|---|

Time | Session |

09:30 AM 10:30 AM | Ezra Miller |

10:30 AM 11:00 AM | Break |

11:00 AM 12:00 PM | David Mount - Approximating Convex Bodies and Applications Recently, a series of seminal results has established the central role that convex approximation plays in solving a number of computational problems in Euclidean spaces of constant dimension. These include fundamental applications such as computing the diameter of a point set, Euclidean minimum spanning trees, bichromatic closest pairs, width kernels, and nearest neighbor searching. In each case, an approximation parameter eps > 0 is given, and the problem is to compute a solution that is within a factor of 1+eps of the exact solution. In this talk, I will present recent approaches to convex approximation. These approaches achieve efficiency by being locally sensitive to the shape and curvature of the body. This is achieved through a combination of methods, both new and classical, including Delone sets in the Hilbert metric, Macbeath regions, and John ellipsoids. We will explore the development of these methods and discuss how they can be applied to the above problems. |

12:00 PM 02:00 PM | Lunch Break |

02:00 PM 03:00 PM | Lizhen Lin - Optimization on manifolds: parallel algorithms and Bayesian optimization techniques The first part of this talk focuses on presenting some communication efficient algorithms for parallel optimization problems on manifolds. Convergence properties of such algorithms will be analyzed. The second part of the talk introduces Bayesian optimization techniques on manifolds. The methodology and computational algorithms will be discussed in details. Applications are considered for a large class of manifolds of interests. |

03:00 PM 03:30 PM | Break |

03:30 PM 03:30 PM | J. S. Marron - Object Oriented Data Analysis: Principal Nested Submanifolds Object Oriented Data Analysis is the statistical analysis of populations of complex objects. This is seen to be particularly relevant in the Big Data era, where it is argued that an even larger scale challenge is Complex Data. Data objects with a geometric structure constitute a particularly active current research area. This is illustrated using a number of examples where data objects naturally lie in manifolds and spaces with a manifold stratification. An overview of the area is given, with careful attention to vectors of angles, i.e. data objects that naturally lie on torus spaces. Prinicpal Nested Submanifolds, which are generalizations of flags, are proposed to provide new analogs of Principal Component Analysis for data visualization. Open problems as to how to weight components in a simultaneous fitting procedure are discussed. |

Wednesday, May 23, 2018 | |
---|---|

Time | Session |

09:30 AM 10:30 AM | Nina Amenta |

10:30 AM 11:00 AM | Break |

11:00 AM 12:00 PM | Edgar Lobaton - From Cyborg-Insect to Human Motion: Topological Data Analysis Applied to Time Series Topological Data Analysis has been applied to a number of problems in order to study the geometric and topological structure of point cloud data. In this talk, we discuss their application to the analysis of time series from cyborg-insect to human motion. These tools allow us to identify different type of motion behaviors from the agents, and help us discover the geometric structure of their surroundings. The mathematical framework developed allows us to derive bounds on the sampling of the data space in order to ensure recovery of the correct topological information, and also provide guarantees on the robustness of these quantities to perturbations on the data. |

06:00 PM 08:00 PM | Banquet @ The Faculty Club |

Thursday, May 24, 2018 | |
---|---|

Time | Session |

09:30 AM 10:30 AM | Samory Kpotufe - Modal-Set Estimation using kNN graphs, and Applications to Clustering Estimating the mode or modal-sets (i.e. extrema points or surfaces) of an unknown density from sample is a basic problem in data analysis. Such estimation is relevant to other problems such as clustering, outlier detection, or can simply serve to identify low-dimensional structures in high dimensional-data (e.g. point-cloud data from medical-imaging, astronomy, etc). Theoretical work on mode-estimation has largely concentrated on understanding its statistical difficulty, while less attention has been given to implementable procedures. Thus, theoretical estimators, which are often statistically optimal, are for the most part hard to implement. Furthermore for more general modal-sets (general extrema of any dimension and shape) much less is known, although various existing procedures (e.g. for manifold-denoising or density-ridge estimation) have similar practical aim. I’ll present two related contributions of independent interest: (1) practical estimators of modal-sets – based on particular subgraphs of a k-NN graph – which attain minimax-optimal rates under surprisingly general distributional conditions; (2) high-probability finite sample rates for k-NN density estimation which is at the heart of our analysis. Finally, I’ll discuss recent successful work towards the deployment of these modal-sets estimators for various clustering applications. Much of the talk is based on a series of work with collaborators S. Dasgupta, K. Chaudhuri, U. von Luxburg, and Heinrich Jiang. |

10:30 AM 11:00 AM | Break |

11:00 AM 12:00 PM | Carola Wenk - Applications of persistent homology to low-dimensional data Persistent homology is commonly applied to study the shape of high dimensional data sets. In this talk we consider two low-dimensional data scenarios in which both geometry and topology are of importance. We study 2D and 3D images of prostate cancer tissue, in which glands form cycles or voids. Treatment decisions for prostate cancer patients are largely based on the pathological analysis of the geometry and topology of these data. As a second scenario we consider the comparison of road networks. |

12:00 PM 02:00 PM | Lunch Break |

02:00 PM 03:00 PM | Peter Bubenik - Topological spaces of persistence modules and their properties Persistence modules are a central algebraic object arising in topological data analysis. Interleaving provides a natural way to measure distances between persistence modules. We consider various classes of persistence modules and the relationships between them. We also study the resulting topological spaces and their basic topological properties. |

03:00 PM 03:30 PM | Break |

03:30 PM 04:30 PM | Chao Chen - A Practical Solution for Homology Localization Computing the optimal cycle of a given homology class is a challenging problem. Existing algorithms, although polynomial, are far from practical. We propose a new algorithm based on the A-star search. A carefully designed heuristic function is proposed to guide the search for the optimal solution. Our method is the first to be practical in large scale image analysis tasks. We apply the method to cardiac image analysis, in which we use persistent homology to detect delicate muscle structures and compute their optimal representations. We believe the method can be applied to many other topology data analysis contexts, where we want to annotate the persistence diagram with additional geometric information. |

Friday, May 25, 2018 | |
---|---|

Time | Session |

09:30 AM 10:30 AM | Mauro Maggioni - Multiscale Methods for Dictionary Learning, Regression and Optimal Transport for data near low-dimensional sets We discuss a family of ideas, algorithms, and results for analyzing various new and classical problems in the analysis of high-dimensional data sets. These methods we discuss perform well when data is (nearly) intrinsically low-dimensional. They rely on the idea of performing suitable multiscale geometric decompositions of the data, and exploiting such decompositions to perform a variety of tasks in signal processing and statistical learning. In particular, we discuss the problem of dictionary learning, where one is interested in constructing, given a training set of signals, a set of vectors (dictionary) such that the signals admit a sparse representation in terms of the dictionary vectors. We then discuss the problem of regressing a function on a low-dimensional unknown manifold. For both problems we introduce a multiscale estimator, fast algorithms for constructing it, and give finite sample guarantees for its performance, and discuss its optimality. Finally, we discuss an application of these multiscale decompositions to the fast calculation of optimal transportation plans, introduce a multiscale version of optimal transportation distances, and discuss preliminary applications. These are joint works with S. Gerber, W. Liao, S. Vigogna. |

10:30 AM 11:00 AM | Break |

11:00 AM 12:00 PM | Dejan Slepcev - Asymptotics of objective functionals in semi-supervised learning We consider a family of regression problems in a semi-supervised setting. Given real-valued labels on a small subset of data the task is to recover the function on the whole data set while taking advantage of the (geometric) structure provided by the large number of unlabeled data points. We consider a random geometric graph to represent the geometry of the data set. We study objective functions which reward the regularity of the estimator function and impose or reward the agreement with the training data. In particular we consider discrete p-Laplacian and fractional Laplacian regularizations. We investigate asymptotic behavior in the limit where the number of unlabeled points increases while the number of training points remains fixed. We uncover a delicate interplay between the regularizing nature of the functionals considered and the nonlocality inherent to the graph constructions. We rigorously obtain almost optimal ranges on the scaling of the graph connectivity radius for the asymptotic consistency to hold. The talk is based on joint works with Matthew Dunlop, Andrew Stuart, and Matthew Thorpe. |

Name | Affiliation | |
---|---|---|

Amenta, Nina | amenta@cs.ucdavis.edu | Computer Science, University of California, Davis |

Bauer, Ulrich | mail@ulrich-bauer.org | Mathematics, TU M""unchen |

Bendich, Paul | bendich@math.duke.edu | none, Geometric Data Analytics, Inc. |

Brunson, Jason | jabrunso@vt.edu | Center for Quantitative Medicine, UConn Health |

Bubenik, Peter | peter.bubenik@gmail.com | Mathematics, University of Florida |

Bungula, Wako | wako-bungula@uiowa.edu | Mathematics, University of Iowa |

Chen, Chao | Computer Science, City University of New York (CUNY) | |

Dey, Tamal K. | dey.8@osu.edu | CSE, The Ohio State University |

Filip, Ioan | ioan.filip@gmail.com | Systems Biology, Columbia University |

Gakhar, Hitesh | hitesh.gakhar@gmail.com | Mathematics, Michigan State University |

Hamilton, Wesley | wham@live.unc.edu | Mathematics, University of North Carolina, Chapel Hill |

Kahle, Matthew | kahle.70@osu.edu | Mathematics, The Ohio State University |

Kpotufe, Samory | samory@princeton.edu | ORFE, Princeton University |

Kurtek, Sebastian | kurtek.1@stat.osu.edu | Statistics, The Ohio State University |

Lin, Lizhen | lizhen.lin@nd.edu | Department of Applied and Computational Mathematics and Statistics, |

Lobaton, Edgar | edgar.lobaton@ncsu.edu | Electrical and Computer Engineering, North Carolina State University |

Maggioni, Mauro | mauro.maggioni@jhu.edu | Mathematics, Johns Hopkins University |

Marron, J. S. | marron@email.unc.edu | Statistics & OR, UNC |

Memoli, Facundo | memoli@math.osu.edu | Mathematics, The Ohio State University |

Miller, Ezra | ezra@math.duke.edu | Mathematics, Duke University |

Moon, Chul | chulmoon@uga.edu | Statistics, University of Georgia |

Mount, David | mount@cs.umd.edu | Computer Science, University of Maryland |

Parida, Laxmi | parida@us.ibm.com | Computational Genomics, IBM THOMAS J. WATSON RESEARCH CENTER |

Scoville, Nicholas | nscoville@ursinus.edu | Mathematics and Computer Science, Ursinus College |

Sivakoff, David | sivakoff.2@osu.edu | Statistics and Mathematics, The Ohio State University |

Slepcev, Dejan | slepcev@math.cmu.edu | Mathematical Sciences, Carnegie Mellon University |

Solomon, Yitzchak | ysolomon@math.brown.edu | Mathematics, Brown University |

Stefanou, Anastasios | astefanou@albany.edu | Mathematics and Statistics, University at Albany (SUNY) |

Vazquez, Mariel | mariel@math.ucdavis.edu | Mathematics, University of California Davis |

Wang, Yusu | yusu@cse.ohio-state.edu | Computer Science and Engineering, The Ohio State University |

Wenk, Carola | cwenk@tulane.edu | Computer Science, Tulane University |

We consider the setting of Reeb graphs of piecewise linear functions and study distances between them that are stable, meaning that functions which are similar in the supremum norm ought to have similar Reeb graphs. We define an edit distance for Reeb graphs and prove that it is stable and universal, meaning that it provides an upper bound to any other stable distance. In contrast, via a specific construction, we show that the interleaving distance and the functional distortion distance on Reeb graphs are not universal.

(Joint work with Claudia Landi and Facundo MÃ©moli)

Many systems employ sensors to interpret the environment.

The target-tracking task is to gather sensor data from the environment and then to partition these data into tracks that are produced by the same target. A key challenge is to "connect-the-dots": more precisely, to take a sensor observation at a given time and associate it with a previously-existing track (or to declare that this is a new object). This can be very challenging, especially when there are multiple targets present, and when different sensors "go blind" at various times. There are a number of high-level designs for multi-target multi-sensor tracking systems, but many contemporary ones fit within the multiple hypothesis (MHT) paradigm. Typical MHTs formulate the 'connect-the-dots' problem as one of Bayesian inference, with competing multi-track hypotheses receiving scores.

This talk surveys three recent efforts to integrate topological and geometric information into the formula for computing hypothesis scores. The first uses zero-dimensional persistent homology summaries of kinematic information in car tracking, and the second uses persistent homology summaries in state space to form grouping hypotheses for nautical traffic. Finally, a method using self-similarity matrices is employed to make useful cross-modal comparisons in heterogeneous sensor networks. In all three efforts, large improvements in MHT performance are observed.

This talk is based on work funded by OSD, NRO, AFRL, and NSF, and it was done with many collaborators, including Nathan Borggren, Sang Chin, Jesse Clarke, Jonathan deSena, John Harer, Jay Hineman, Elizabeth Munch, Andrew Newman, Alex Pieloch, David Porter, David Rouse, Nate Strawn, Christopher J. Tralie, Adam Watkins, Michael Williams, and Peter Zulch.

Persistence modules are a central algebraic object arising in topological data analysis. Interleaving provides a natural way to measure distances between persistence modules. We consider various classes of persistence modules and the relationships between them. We also study the resulting topological spaces and their basic topological properties.

Computing the optimal cycle of a given homology class is a challenging problem. Existing algorithms, although polynomial, are far from practical. We propose a new algorithm based on the A-star search. A carefully designed heuristic function is proposed to guide the search for the optimal solution. Our method is the first to be practical in large scale image analysis tasks. We apply the method to cardiac image analysis, in which we use persistent homology to detect delicate muscle structures and compute their optimal representations. We believe the method can be applied to many other topology data analysis contexts, where we want to annotate the persistence diagram with additional geometric information.

Estimating the mode or modal-sets (i.e. extrema points or surfaces) of an unknown density from sample is a basic problem in data analysis. Such estimation is relevant to other problems such as clustering, outlier detection, or can simply serve to identify low-dimensional structures in high dimensional-data (e.g. point-cloud data from medical-imaging, astronomy, etc). Theoretical work on mode-estimation has largely concentrated on understanding its statistical difficulty, while less attention has been given to implementable procedures. Thus, theoretical estimators, which are often statistically optimal, are for the most part hard to implement. Furthermore for more general modal-sets (general extrema of any dimension and shape) much less is known, although various existing procedures (e.g. for manifold-denoising or density-ridge estimation) have similar practical aim. Iâ€™ll present two related contributions of independent interest: (1) practical estimators of modal-sets â€“ based on particular subgraphs of a k-NN graph â€“ which attain minimax-optimal rates under surprisingly general distributional conditions; (2) high-probability finite sample rates for k-NN density estimation which is at the heart of our analysis. Finally, Iâ€™ll discuss recent successful work towards the deployment of these modal-sets estimators for various clustering applications.

Much of the talk is based on a series of work with collaborators S. Dasgupta, K. Chaudhuri, U. von Luxburg, and Heinrich Jiang.

The first part of this talk focuses on presenting some communication efficient algorithms for parallel optimization problems on manifolds. Convergence properties of such algorithms will be analyzed. The second part of the talk introduces Bayesian optimization techniques on manifolds. The methodology and computational algorithms will be discussed in details. Applications are considered for a large class of manifolds of interests.

Topological Data Analysis has been applied to a number of problems in order to study the geometric and topological structure of point cloud data. In this talk, we discuss their application to the analysis of time series from cyborg-insect to human motion. These tools allow us to identify different type of motion behaviors from the agents, and help us discover the geometric structure of their surroundings. The mathematical framework developed allows us to derive bounds on the sampling of the data space in order to ensure recovery of the correct topological information, and also provide guarantees on the robustness of these quantities to perturbations on the data.

We discuss a family of ideas, algorithms, and results for analyzing various new and classical problems in the analysis of high-dimensional data sets. These methods we discuss perform well when data is (nearly) intrinsically low-dimensional. They rely on the idea of performing suitable multiscale geometric decompositions of the data, and exploiting such decompositions to perform a variety of tasks in signal processing and statistical learning. In particular, we discuss the problem of dictionary learning, where one is interested in constructing, given a training set of signals, a set of vectors (dictionary) such that the signals admit a sparse representation in terms of the dictionary vectors. We then discuss the problem of regressing a function on a low-dimensional unknown manifold. For both problems we introduce a multiscale estimator, fast algorithms for constructing it, and give finite sample guarantees for its performance, and discuss its optimality. Finally, we discuss an application of these multiscale decompositions to the fast calculation of optimal transportation plans, introduce a multiscale version of optimal transportation distances, and discuss preliminary applications. These are joint works with S. Gerber, W. Liao, S. Vigogna.

Object Oriented Data Analysis is the statistical analysis of populations of complex objects. This is seen to be particularly relevant in the Big Data era, where it is argued that an even larger scale challenge is Complex Data. Data objects with a geometric structure constitute a particularly active current research area. This is illustrated using a number of examples where data objects naturally lie in manifolds and spaces with a manifold stratification. An overview of the area is given, with careful attention to vectors of angles, i.e. data objects that naturally lie on torus spaces. Prinicpal Nested Submanifolds, which are generalizations of flags, are proposed to provide new analogs of Principal Component Analysis for data visualization. Open problems as to how to weight components in a simultaneous fitting procedure are discussed.

Recently, a series of seminal results has established the central role that convex approximation plays in solving a number of computational problems in Euclidean spaces of constant dimension. These include fundamental applications such as computing the diameter of a point set, Euclidean minimum spanning trees, bichromatic closest pairs, width kernels, and nearest neighbor searching. In each case, an approximation parameter eps > 0 is given, and the problem is to compute a solution that is within a factor of 1+eps of the exact solution.

In this talk, I will present recent approaches to convex approximation. These approaches achieve efficiency by being locally sensitive to the shape and curvature of the body. This is achieved through a combination of methods, both new and classical, including Delone sets in the Hilbert metric, Macbeath regions, and John ellipsoids. We will explore the development of these methods and discuss how they can be applied to the above problems.

Many problems in genomics are combinatorial in nature, i.e., the interrelationship amongst the entities is at par, if not more important, than the value of the entity themselves.

Graphs are the most commonly used mathematical object to model such relationships. However, often it is important to capture not just pairwise, but more general $k$-wise relationships. TDA provides a natural basis to model such interactions and, additionally, it provides mechanisms for extracting signal patterns from noisy data.

I will discuss two applications of this model, one to population genomics and the other to meta-genomics. In the former, we model the problem of detecting admixture in populations and in the latter we deal with the problem of detecting false positives in microbiome data.

The unifying theme is the use of topological methods -- in particular, persistent homology. I will explain the underlying mathematical models and describe the results obtained in both cases.

We consider a family of regression problems in a semi-supervised setting. Given real-valued labels on a small subset of data the task is to recover the function on the whole data set while taking advantage of the (geometric) structure provided by the large number of unlabeled data points. We consider a random geometric graph to represent the geometry of the data set. We study objective functions which reward the regularity of the estimator function and impose or reward the agreement with the training data. In particular we consider discrete p-Laplacian and fractional Laplacian regularizations.

We investigate asymptotic behavior in the limit where the number of unlabeled points increases while the number of training points remains fixed. We uncover a delicate interplay between the regularizing nature of the functionals considered and the nonlocality inherent to the graph constructions. We rigorously obtain almost optimal ranges on the scaling of the graph connectivity radius for the asymptotic consistency to hold. The talk is based on joint works with Matthew Dunlop, Andrew Stuart, and Matthew Thorpe.

In Escherichia coli DNA replication yields two interlinked circular chromosomes. Returning the chromosomes to an unlinked monomeric state is essential to cell survival. Simplification of DNA topology is mediated by enzymes, such as recombinases and topoisomerases. Recombinases act by a local reconnection event. We investigate analytically minimal pathways of unlinking by local reconnection. We introduce a Monte Carlo method to simulate local reconnection, provide a quantitative measure to distinguish among pathways and identify a most probable pathway. These results point to a universal property relevant to any local reconnection event between two sites along one or two circles.

Persistent homology is commonly applied to study the shape of high dimensional data sets. In this talk we consider two low-dimensional data scenarios in which both geometry and topology are of importance. We study 2D and 3D images of prostate cancer tissue, in which glands form cycles or voids. Treatment decisions for prostate cancer patients are largely based on the pathological analysis of the geometry and topology of these data. As a second scenario we consider the comparison of road networks.