## CTW: Statistics, Geometry, and Combinatorics on Stratified Spaces arising from Biological Problems

### Organizers

Stephan Huckemann
Mathematical Stochastics, Georg-August University GÃ¶ttingen
Huiling Le
Scool of Mathematical Sciences, University of Nottingham
Ezra Miller
..., SAMSI (Statistical and Applied Mathematical Sciences Institute)
Megan Owen
Computer Science, University of Waterloo
Victor Patrangenaru
Statistics, Florida State University

Modern statistics problems, from areas such as evolutionary biology, medical imaging, and shape analysis, increasingly deal with data sampled from spaces that are singular but naturally stratified; that is, the spaces behave nicely at most points, but at certain points the smooth structure becomes degenerate, such as when the space is composed of two or more intersecting smooth pieces. Key examples of stratified spaces are shape spaces (representing equivalence classes of point configurations under operations such as rotation, translation, scaling, projective transformations, or other non-linear transformations) and tree spaces (representing metric phylogenetic trees on fixed sets of taxa). Generalizing these two examples leads to algebraic varieties and polyhedral complexes, respectively. Applications require knowledge of the asymptotics of distributions on such spaces. Developments in this "stratified statistics" take their cue from more classical geometric statistics, where data points are sampled from smooth manifolds, or from neighborhoods of embedded manifolds. Now, however, interesting algebraic geometry and combinatorics join the mix as methods for controlling behavior near strata of lower dimension, where the sample space can be singular nearby. Asymptotics on such spaces are governed not only by their local structure, but also by global topology (of the space and the data). Thus there has been increasing interest in the recently emerged method of statistical persistent homology. First results from the systematic study of nonparametric statistics on data sampled from stratified spaces include central limit theorems (CLTs) that illustrate nonclassical behavior, particularly when the mean lies on a lower stratum. The related asymptotics in this surprisingly common circumstance can depend in a crucial way on global geometry. Other first results include concrete combinatorial constructions of sample spaces. Interpretations for these results, particularly nonclassical CLTs, are immediately useful in specific applied problems from phylogenetics, brain imaging, and human binocular vision, but they raise fundamental pure mathematical questions relating curvature to asymptotics of probability distributions in non-smooth settings. Many of these investigations were initiated by a Working Group at the Statistical and Applied Mathematical Sciences Institute (SAMSI) program on analysis of object data. This MBI workshop aims to stimulate progress and cross-fertilization in the rapidly moving areas of theoretical and applied stratified statistics by gathering a mix of researchers with interests in biology, geometry, combinatorics, topology, probability, and statistics. The hope is to develop stratified methods to solve problems arising from investigations on existing biological and medical data sets, particularly those involving trees and more general shapes.

### Accepted Speakers

Rudolf Beran
Statistics, University of California, Davis
Rabi Bhattacharya
Mathematics, University of Arizona
Peter Bubenik
Mathematics, Cleveland State University
Mathematics, Williams College
Aasa Feragen
Department of Computer Science,
Harrie Hendriks
Applied Stochastics,
Susan Holmes
Statistics, Stanford University
David Houle
Biological Science, Florida State University
Wilfrid Kendall
Statistics, University of Warwick
John Kent
Department of Statistics, University of Leeds
Peter Kim
Mathematics and Statistics, University of Guelph
Robert MacPherson
Dept. of Mathematics, Institute for Advanced Study
Steve Marron
Statistics and O. R., University of North Carolina, Chapel Hill
Axel Munk
Mathematics,
Megan Owen
Computer Science, University of Waterloo
Armin Schwartzman
Biostatistics, Harvard University
Monday, May 21, 2012
Time Session
09:00 AM
10:00 AM
Peter Kim - Some Recent Experience with Clostridium Difficile
C. difficile associated outbreaks have been reported worldwide, some with increased mortality and morbidity. Symptoms of this infectious disease range from mild diarrhea to severe colitis and even bowel perforation and death. The bacterium C. difficile is found with the normal bacteria comprising the intestinal flora. These can be killled by antibiotics but not the C. difficile spores, which are insensitive to the majority of antibiotics. The diagnosis of C. difficile infection is based on clinical signs and symptoms and a positive laboratory test for toxigenic C. difficile. Of particular concern is the NAP1/BI/027 strain which has affected North American hospitals of which Southern Ontario hospitals have been especially hard hit. In this talk we will discuss some recent experience with C. difficile along with the use of fecal biotherapy as an effective alternative to standard antibiotics. We will also go over some of the metagenomic sequencing results outlining the observed changes in intestinal flora pre and post fecal biotherapy.
10:30 AM
11:00 AM
Hongtu Zhu - Geometry and statistics of data
Not available
11:15 AM
12:15 PM
David Houle - Approaching the evolution of novelty: where biology needs math and statistics
The genetics and evolution of biological systems are extremely complex because of the large number of traits , and complex relationships among those traits. We use the form of fruit fly wings as a model to study the variational properties of complex biological structures. Variation is important because it controls evolutionary potential. Questions about evolutionary potential of high-dimensional entities raise a series of difficult mathematical and statistical problems.

Our data suggests that the dimensionality of the underlying system is very high. Could the data lie on a manifold embedded in the linear space of phenotypes? If so, phenomena that seem complex could have simple explanations. Manifold-finding based on genotypic data has not yet been attempted.
The pattern of variation in two different populations can be quite different. Can we identify the common phenotypic subspace, and, even more interesting, the subspaces where one has variation, and the other does not? Statistical approaches to those questions are not known (at least in biology)
How can we understand and predict the appearance of qualitatively novel phentoypes? Qualtitative novelty is one of the largest unsolved problems in biology. Is it possible to construct metrics for the ?novelty distance? between phenotypes that predict evolution? One possible kind of metric could combine geometry and topology as is done with persistent homology. Biology may offer different metrics based on the effects of mutation or common transitions during development.
Biologists need the expertise of mathematicians and statisticians to help us answer these important questions.
02:00 PM
03:00 PM
Marc Arnaudon - Medians, means and minimax centers in Riemannian geometry: existence, uniqueness, robustness and algorithms. Application to signal detection
We give detailed results on the existence and uniqueness for medians, means and minimax centers of probability measures on Riemannian manifolds, including the case when the probability measure is supported in a regular geodesic ball and the case of generic data points in a complete manifold. Some properties of Fr'echet medians are also given, such as statistical consistency and quantitative explanation of robustness. In order to compute the Riemannian medians and means, we develop deterministic and stochastic gradient descent algorithms. We show the convergence of these algorithms in regular geodesic balls. The rate of convergence and error estimates of these algorithms are also obtained. For probability measures with support in compact manifolds, partial simulated annealing is used to obtain processes which converge to the means. Simulation examples of our algorithms are also shown, in the case of Toeplitz Hermitian positive definite matrices coming from covariance matrices of autoregressive processes. Applications to signal detection are given.
Tuesday, May 22, 2012
Time Session
09:00 AM
10:00 AM
Peter Bubenik - Towards statistical topology: homology, persistent homology and persistence landscapes
One of the principal uses of topology is to patch together local quantitative data to obtain global qualitative information not readily accessible to other methods. While the early development of topology was largely driven by applications, many later advances were motivated by strictly mathematical concerns. Now the field of applied topology is returning topology to its roots, adapting some of the later advances in topological methods to current questions in applications. I will survey some of the central constructions in topological data analysis, introducing homology and persistent homology.

There is a clear need to combine these tools with statistical analysis. However there are difficulties in doing so, as the space of the usual topological descriptor is not a manifold. I define a new topological descriptor, the persistence landscape, whose definition allows for the calculation of means and standard deviations, laws of large numbers, central limit theorems and hypothesis testing.
10:30 AM
11:00 AM
Megan Owen - Statistics in Tree Space
The space of metric phylogenetic trees, as constructed by Billera, Holmes, and Vogtmann, is a polyhedral cone complex. This space is non-positively curved, which ensures there is a unique shortest path (geodesic) between any two trees, and that the mean and variance of a set or distribution of trees is well-defined. Furthermore, there is a polynomial time algorithm to compute geodesics, which leads to a practical algorithm for computing mean trees. I will present some applications of this mean and variance to some biological problems, such as constructing species trees from gene trees and understanding the effect of sequence length on tree reconstruction. This is joint work with Ezra Miller and Scott Provan.
11:15 AM
12:15 PM
Robert MacPherson - Survey of stratified spaces
Stratified spaces arise in many contexts within mathematics. They are the natural class of topological spaces of "finite complexity". In many cases, they come endowed with canonical probability distributions on them.

This will be a survey talk with examples, such as spaces of configuration of points.
02:00 PM
02:30 PM
John Kent - The geometry and topology of projective shape space
Projective geometry underlies the way in which information about a 3d scene can be deduced from (one or more) 2d camera views. A key concept in projective geometry is that of a projective invariant for a configuration of collinear or coplanar points. The collection of information in the projective invariants can be termed the "projective shape" of a configuration. In this talk we use a spherical camera and adapt ideas from the Procrustes approach to similarity shape analysis to give a standardized representation for projective shapes. The resulting geometry faciliates metric comparisons between different projective shapes. The resulting topology leads to a clear understanding of the singularities in projective shape space.

Finally, the details behind the standardization lead to a distinction between four variants of projective shape space depending on the "type" of camera: oriented vs. non-oriented and directional vs. axial.
02:45 PM
03:15 PM
Satyan Devadoss - Phylogenetic networks and the real moduli space of curves
Our story is motivated by the configuration space of particles on spheres. In the 1970s, Grothendieck, Deligne, and Mumford constructed a way to keep track of particle collisions in this space using Geometric Invariant Theory. In the 1990s, Gromov and Witten utilized them as invariants arising from string field theory and quantum cohomology. We consider the real points of these spaces, but now interpret them as spaces of rooted metric labeled trees. They have elegant geometric and combinatorial properties, being compact hyperbolic manifolds with a beautiful tessellation by convex polytopes. In recent years, they have gained importance in their own right, appearing in areas such as representation theory, geometric group theory, tropical geometry, and lately reinterpreted by Levy and Pachter as spaces of phylogenetic networks. In particular, these real moduli spaces resolve the singularities of the spaces of phylogenetic trees studied by Billera, Holmes, and Vogtmann.
Wednesday, May 23, 2012
Time Session
09:00 AM
10:00 AM
Steve Marron - Object Oriented Data Analysis
Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.
10:30 AM
11:00 AM
Ezra Miller - Sticky central limit theorems at singularities
Applications to areas such as biology, medicine, and image analysis require understanding the asymptotics of distributions on stratified spaces, such as tree spaces. In the surprisingly common circumstance when Frechet (intrinsic) means of distributions on stratified spaces lie on strata of low dimension, central limit theorems can exhibit non-classical "sticky" behavior: positive mass can be supported on thin subsets of the ambient space. This talk reports on investigations initiated by a Working Group at the Statistical and Applied Mathematical Sciences Institute (SAMSI) program on Analysis of Object Data, and continued jointly with Stephan Huckemann, Jonathan Mattingly, and Jim Nolen.
11:15 AM
12:15 PM
Rabi Bhattacharya - Nonparametric Statistics on Manifolds- By Examples and Applications
The general theory of nonparametric statistics on manifolds M presented here is of relatively recent origin. It builds much of its framework on the notion of the Fre'chet mean of a probability measure Q, namely, the point on the manifold which minimizes the expected squared distance from a random variable with distribution Q. The nonparametric methods are intrinsic or extrinsic, depending on the distance used on M. The extrinsic distance is the distance induced from a good embedding of M in a Euclidean space, while the intrinsic distance is the geodesic distance on the manifold when endowed with a Riemannian structure. In examples, it is often the case that the nonparametric methods yield sharper inference than their parametric counterparts provide. Although we consider an application to paleomagnetism where M is the sphere S2, our main emphasis is on landmarks based shape spaces. The latter include (i) spaces of 2D and 3D images invariant under an appropriate group of transformations, which are useful in morphometrics and medical diagnostics, (ii) affine shape spaces invariant under affine transformations, useful in scene recognition based on satellite images, and (iii) projective shape spaces used in machine vision and robotics. We also consider 2D continuous images, and nonparametric estimation of shape densities.

This talk is based on joint work with Vic Patrangenaru and Abhishek Bhattacharya. It is supported in part by the NSF grant DMS 1107053.
03:00 PM
05:00 PM
Susan Holmes, Steve Marron - Bacterial trees in the Human Microbiome
Many studies are underway to describe the human microbiome, I will describe some of the methods used that combine phylogenetic trees and abundance data from high throughput sequencing and new microarray techniques.

In particular various particular metrics' have shown useful in coming to conclusions about explanatory clinical or contingent variables in such studies.

This talk contains joint work with PJ McMurdie, as well as David Relman's lab and Alfred Sporman at Stanford.
Thursday, May 24, 2012
Time Session
09:00 AM
10:00 AM
Stephan Huckemann - On Omitting and Hitting Properties for Means on Circles and Shape Spaces
The classical central limit theorem states that suitably translated and root n rescaled independent sample means tend to a multivariate Gaussian. Under certain, still rather restrictive conditions, it has been shown by Bhattacharya and Patrangenaru (2005) that the analog holds true on manifolds. One condition, namely uniqueness has been pushed to "data contained in a geodesic half ball" by Afsari (2011), which in particular encompasses "omitting a neighborhood of the cut locus" if non-void.

Determining asymptotics when the cut locus is not omitted proves to be challenging. For circles we present an exhaustive treatment of uniqueness and, in view of asymptotics, of the role of mass around the antipodal point.

Another issue turning up in shape spaces -- which may be manifolds with singularities -- is whether means omit these singularities and are stably assumed on the manifold part. We show that while intrinsic and Ziezold means are manifold stable, Procrustes means may hit singularities.

In consequence, e.g. for 3D shape analysis, given uniqueness, discrimination and classification based on the two-sample test is possible for intrinsic or Ziezold means. Procrustes means, however, may disqualify.

This talk is based on joint work with Thomas Hotz.
10:30 AM
11:00 AM
Armin Schwartzman - Geometry and Statistics in the Eigen-structure of Symmetric (Positive Semi-Definite) Matrices
Symmetric positive semi-definite (PSD) matrices appear as data objects in the statistical analysis of Diffusion Tensor Imaging data, where there is interest in making inferences about the eigenvalues and eigenvectors of these objects. In this talk, I present a stratification of the set of symmetric PSD matrices of arbitrary dimension according to their eigenvalues, as well as maximum likelihood estimators (MLEs) and log-likelihood ratio (LLR) tests for the eigenvalues and eigenvectors of the mean matrix in a symmetric-matrix Gaussian model. The parameter sets involved are subsets of Euclidean space that are either affine subspaces, polyhedral convex cones, or orthogonally invariant embedded submanifolds. The asymptotic behavior of the MLEs and LLRs depend on the stratum where the true mean matrix lies.
11:15 AM
12:15 PM
Rudolf Beran - Manifold-valued Tuning Parameters in Regularized Estimation of Multivariate Means
A multivariate k-way layout consists of observations with error on an array of vector-valued means, each of which is an unknown function of k real-valued covariates. Any decomposition of these vector means into a sum of orthogonal projections induces least squares submodel fits that serve as candidate estimators of the mean vectors. MANOVA submodel fits, nested polynomial regression fits, or mixed combinations of both strategies illustrate classically. This talk describes penalized least squares estimators of the multivariate means in which the penalty terms are weighted through manifold-valued tuning parameters. Data-based selection of the tuning parameters yields estimators that dominate asymptotically those that arise from submodel fitting. In the special case of a complete balanced multivariate k-way layout, the proposed regularized estimators are linked to multiple Efron-Morris affine shrinkage. In unbalanced designs, the regularized estimators define a powerful generalization of affine shrinkage.
02:00 PM
02:30 PM
Aasa Feragen - The geometry and statistics of geometric trees
Anatomical tree-structures such as airway trees from lungs, blood vessels or dendrite trees in neurons, carry information about the organ that they are part of. Anatomical trees can be modeled as geometric trees, which are combinatorial trees whose edges are endowed with edge attributes describing their geometry. We consider edge attributes which take continuous scalar or vector values, leading to a continuum of trees rather than a discrete set of trees.

We shall discuss different ways of building spaces of such geometric trees, all with the goal of obtaining a geodesic space of trees where statistical parameters can be computed with the help of geodesics. For geometric trees of any size, we can define a geodesic space of trees, but geodesic computations are NP complete and the space has nowhere bounded curvature, which means that many statistical tools are not readily available. By adding restrictions on size, admissible topologies, branch order and/or branch labeling, we can regularize the space in order to obtain spaces which have nicer properties in terms of computational complexity and statistical applications. We shall discuss the positive effect of these assumptions on the solvability of statistical problems along with their negative effect on the ability to model real anatomical trees. Finally, we shall present some recent results from experiments on airway trees from lung CT scans.
02:45 PM
03:15 PM
Axel Munk - The Multiresolution Dantzig Selector: From Ion Channel recordings to Biomolecular Microscopy
In this talk we will introduce the multiscale Dantzig selector in the particular context of signal detection and imaging. This method allows to combine variational regularization methods with statistical multiscale techniques in a statistical sound manner. We address computational issues as well as asymptotic stochastic process theory of the multiscale statistics. The modeling of ion channel recordings and reconstruction in nanoscale biophotonic cell microscopy will be discussed in detail.
Friday, May 25, 2012
Time Session
09:00 AM
10:00 AM
Harrie Hendriks - Mean location, the two sample problem Harrie Hendriks, Mathematics, Radboud University Nijmegen
The context will be the estimation of a parameter of a probability distribution, where the parameter lies in a differentiable manifold, more specifically in a submanifold of Euclidean space. The parameter could be a Frechet mean of a probability distribution on the submanifold itself, Frechet mean with respect to the Euclidean distance. We will give an account of the two-sample problem.

This talk is based on joint work with Zinoviy Landsman. Examples from the literature will be indicated. Downs considered the QRS loop in vectorcardiograms, characterized by a pair of orthogonal unit vectors in 3-space. The space of such pairs is the Stiefel manifold V32, and can be considered as submanifold of 6-dimensional Euclidean space. A more involved example, considered by Rivest et al., is the human ankle joint that exhibits two independent rotation axes of the foot. The directions of these axes are of importance.
10:30 AM
11:00 AM
Giseon Heo - Topological Analysis of Variance and the Maxillary Complex
Persistent homology, a recent development in computational topology, has shown to be useful for analyzing high dimensional non-linear data. In this talk, we connect computational topology with the traditional analysis of variance and demonstrate this synergy on a three-dimensional orthodontic landmark data set derived from the maxillary complex. (Joint work with Jennifer Gamble and Peter Kim)
11:15 AM
12:15 PM
Wilfrid Kendall - Riemannian barycentres: from harmonic maps and statistical shape to the classical central limit theorem
The subject of Riemannian barycentres has a strikingly long history, stretching back to work of Frechet and Cartan. The first part of this talk will be a review of the fundamental ideas and a discussion of the work of various probabilists and statisticians on applications of the concept to probabilistic approaches to harmonic map theory and statistical shape theory. I will then present some recent joint work with Huiling Le concerning central limit theory for empirical barycentres, which to our considerable surprise has led us to a new perspective on the classical Lindeberg-Feller central limit theorem.
Name Email Affiliation
Afsari, Bijan bijan@cis.jhu.edu Center for Imaging Science, Johns Hopkins University
Arnaudon, Marc marc.arnaudon@math.univ-poitiers.fr Mathematics,
Belkin, Mikhail mbelkin@cse.ohio-state.edu Department of Computer Science and Engineering, The Ohio State University
Bendich, Paul bendich@math.duke.edu Mathematics, Duke University
Beran, Rudolf beran@wald.ucdavis.edu Statistics, University of California, Davis
Bhattacharya, Rabi rabi@math.arizona.edu Mathematics, University of Arizona
Bubenik, Peter p.bubenik@csuohio.edu Mathematics, Cleveland State University
Buibas, Marius mbuibas@ucsd.edu Physics, University of California, San Diego
Dryden, Ian dryden@mailbox.sc.edu Statistics, University of South Carolina
Ellingson, Leif leif.ellingson@ttu.edu Mathematics and Statistics, Texas Tech University
Feragen, Aasa aasa@diku.dk Department of Computer Science,
Forcey, Stefan sf34@uakron.edu Mathematics, University of Akron
Groisser, David groisser@ufl.edu Mathematics, University of Florida
Hallgrimsson, Benedikt bhallgri@ucalgary.ca Cell Biology & Anatomy, University of Calgary
Hendriks, Harrie Harrie.Hendriks@math.ru.nl Applied Stochastics,
Heo, Giseon gheo@ualberta.ca Mathematical and Statistical Sciences, University of Alberta
Holmes, Susan susan@stat.stanford.edu Statistics, Stanford University
Hotz, Thomas hotz@math.uni-goettingen.de Institute for Mathematical Stochastics, University of Goettingen
Houle, David dhoule@bio.fsu.edu Biological Science, Florida State University
Huber, Gregory huber@uchc.edu Mathematics, University of Connecticut
Huckemann, Stephan huckeman@math.uni-goettingen.de Mathematical Stochastics, Georg-August University GÃ¶ttingen
Kendall, Wilfrid w.s.kendall@warwick.ac.uk Statistics, University of Warwick
Kent, John J.T.Kent@leeds.ac.uk Department of Statistics, University of Leeds
Kim, Peter pkim@uoguelph.ca Mathematics and Statistics, University of Guelph
Kubatko, Laura Kubatko.2@osu.edu Statistics/EEOB, The Ohio State University
Le, Huiling Huiling.Le@nottingham.ac.uk Scool of Mathematical Sciences, University of Nottingham
MacPherson, Robert rdm@math.ias.edu Dept. of Mathematics, Institute for Advanced Study
Mao, Yi maoyi0@gmail.com Department of Microbiology, Boston University
Marron, J. S. marron@email.unc.edu Statistics and O. R., University of North Carolina, Chapel Hill
Miakonkana, Guy-vanie gmm0006@auburn.edu Mathematics and Statistics, Auburn University
Miller, Ezra ezra@math.duke.edu ..., SAMSI (Statistical and Applied Mathematical Sciences Institute)
Mio, Washington mio@math.fsu.edu Mathematics, Florida State University
Mudalige, Nishan nishanm@mathstat.yorku.ca Mathematics and Statistics, York University
Munk, Axel munk@math.uni-goettingen.de Mathematics,
Nye, Tom Tom.Nye@ncl.ac.uk School of Maths and Stats,
Owen, Megan mowen@fields.utoronto.ca Computer Science, University of Waterloo
Patrangenaru, Victor vic@stat.fsu.edu Statistics, Florida State University
Pinder, Shaun spinder@uoguelph.ca Mathematics and Statistics, University of Guelph
Provan, Scott provan@email.unc.edu Statistics and Operations Research, University of North Carolina, Chapel Hill
Rush, Stephen srush01@uoguelph.ca Mathematics and Statistics, University of Guelph
San Valentin, Gene Paul gsanvale@math.fsu.edu Mathematics, Florida State University
Schwartzman, Armin armins@hsph.harvard.edu Biostatistics, Harvard University
Shenfeld, Daniel shenfeld@math.princeton.edu Mathematics, Princeton University
Sitharam, Meera sitharam@cise.ufl.edu computer and information science and engineering, University of Florida
Skwerer, Sean sskwerer@unc.edu Statistics and Operations Research, University of North Carolina, Chapel Hill
St. John, Katherine stjohn@lehman.cuny.edu Math & Computer Science, City University of New York (CUNY)
Wang, Yusu yusu@cse.ohio-state.edu Computer Science and Engineering, The Ohio State University
Wood, Andrew andrew.wood@nottingham.ac.uk School of Mathematical Sciences, University of Nottingham
Zhu, Hongtu htzhu@email.unc.edu Biostatistics, University of North Carolina, Chapel Hill
Zhu, Hongtu hzhu@bios.unc.edu Biostatistics and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill
Medians, means and minimax centers in Riemannian geometry: existence, uniqueness, robustness and algorithms. Application to signal detection
We give detailed results on the existence and uniqueness for medians, means and minimax centers of probability measures on Riemannian manifolds, including the case when the probability measure is supported in a regular geodesic ball and the case of generic data points in a complete manifold. Some properties of Fr'echet medians are also given, such as statistical consistency and quantitative explanation of robustness. In order to compute the Riemannian medians and means, we develop deterministic and stochastic gradient descent algorithms. We show the convergence of these algorithms in regular geodesic balls. The rate of convergence and error estimates of these algorithms are also obtained. For probability measures with support in compact manifolds, partial simulated annealing is used to obtain processes which converge to the means. Simulation examples of our algorithms are also shown, in the case of Toeplitz Hermitian positive definite matrices coming from covariance matrices of autoregressive processes. Applications to signal detection are given.
Manifold-valued Tuning Parameters in Regularized Estimation of Multivariate Means
A multivariate k-way layout consists of observations with error on an array of vector-valued means, each of which is an unknown function of k real-valued covariates. Any decomposition of these vector means into a sum of orthogonal projections induces least squares submodel fits that serve as candidate estimators of the mean vectors. MANOVA submodel fits, nested polynomial regression fits, or mixed combinations of both strategies illustrate classically. This talk describes penalized least squares estimators of the multivariate means in which the penalty terms are weighted through manifold-valued tuning parameters. Data-based selection of the tuning parameters yields estimators that dominate asymptotically those that arise from submodel fitting. In the special case of a complete balanced multivariate k-way layout, the proposed regularized estimators are linked to multiple Efron-Morris affine shrinkage. In unbalanced designs, the regularized estimators define a powerful generalization of affine shrinkage.
Nonparametric Statistics on Manifolds- By Examples and Applications
The general theory of nonparametric statistics on manifolds M presented here is of relatively recent origin. It builds much of its framework on the notion of the Fre'chet mean of a probability measure Q, namely, the point on the manifold which minimizes the expected squared distance from a random variable with distribution Q. The nonparametric methods are intrinsic or extrinsic, depending on the distance used on M. The extrinsic distance is the distance induced from a good embedding of M in a Euclidean space, while the intrinsic distance is the geodesic distance on the manifold when endowed with a Riemannian structure. In examples, it is often the case that the nonparametric methods yield sharper inference than their parametric counterparts provide. Although we consider an application to paleomagnetism where M is the sphere S2, our main emphasis is on landmarks based shape spaces. The latter include (i) spaces of 2D and 3D images invariant under an appropriate group of transformations, which are useful in morphometrics and medical diagnostics, (ii) affine shape spaces invariant under affine transformations, useful in scene recognition based on satellite images, and (iii) projective shape spaces used in machine vision and robotics. We also consider 2D continuous images, and nonparametric estimation of shape densities.

This talk is based on joint work with Vic Patrangenaru and Abhishek Bhattacharya. It is supported in part by the NSF grant DMS 1107053.
Towards statistical topology: homology, persistent homology and persistence landscapes
One of the principal uses of topology is to patch together local quantitative data to obtain global qualitative information not readily accessible to other methods. While the early development of topology was largely driven by applications, many later advances were motivated by strictly mathematical concerns. Now the field of applied topology is returning topology to its roots, adapting some of the later advances in topological methods to current questions in applications. I will survey some of the central constructions in topological data analysis, introducing homology and persistent homology.

There is a clear need to combine these tools with statistical analysis. However there are difficulties in doing so, as the space of the usual topological descriptor is not a manifold. I define a new topological descriptor, the persistence landscape, whose definition allows for the calculation of means and standard deviations, laws of large numbers, central limit theorems and hypothesis testing.
Phylogenetic networks and the real moduli space of curves
Our story is motivated by the configuration space of particles on spheres. In the 1970s, Grothendieck, Deligne, and Mumford constructed a way to keep track of particle collisions in this space using Geometric Invariant Theory. In the 1990s, Gromov and Witten utilized them as invariants arising from string field theory and quantum cohomology. We consider the real points of these spaces, but now interpret them as spaces of rooted metric labeled trees. They have elegant geometric and combinatorial properties, being compact hyperbolic manifolds with a beautiful tessellation by convex polytopes. In recent years, they have gained importance in their own right, appearing in areas such as representation theory, geometric group theory, tropical geometry, and lately reinterpreted by Levy and Pachter as spaces of phylogenetic networks. In particular, these real moduli spaces resolve the singularities of the spaces of phylogenetic trees studied by Billera, Holmes, and Vogtmann.
The geometry and statistics of geometric trees
Anatomical tree-structures such as airway trees from lungs, blood vessels or dendrite trees in neurons, carry information about the organ that they are part of. Anatomical trees can be modeled as geometric trees, which are combinatorial trees whose edges are endowed with edge attributes describing their geometry. We consider edge attributes which take continuous scalar or vector values, leading to a continuum of trees rather than a discrete set of trees.

We shall discuss different ways of building spaces of such geometric trees, all with the goal of obtaining a geodesic space of trees where statistical parameters can be computed with the help of geodesics. For geometric trees of any size, we can define a geodesic space of trees, but geodesic computations are NP complete and the space has nowhere bounded curvature, which means that many statistical tools are not readily available. By adding restrictions on size, admissible topologies, branch order and/or branch labeling, we can regularize the space in order to obtain spaces which have nicer properties in terms of computational complexity and statistical applications. We shall discuss the positive effect of these assumptions on the solvability of statistical problems along with their negative effect on the ability to model real anatomical trees. Finally, we shall present some recent results from experiments on airway trees from lung CT scans.
Mean location, the two sample problem Harrie Hendriks, Mathematics, Radboud University Nijmegen
The context will be the estimation of a parameter of a probability distribution, where the parameter lies in a differentiable manifold, more specifically in a submanifold of Euclidean space. The parameter could be a Frechet mean of a probability distribution on the submanifold itself, Frechet mean with respect to the Euclidean distance. We will give an account of the two-sample problem.

This talk is based on joint work with Zinoviy Landsman. Examples from the literature will be indicated. Downs considered the QRS loop in vectorcardiograms, characterized by a pair of orthogonal unit vectors in 3-space. The space of such pairs is the Stiefel manifold V32, and can be considered as submanifold of 6-dimensional Euclidean space. A more involved example, considered by Rivest et al., is the human ankle joint that exhibits two independent rotation axes of the foot. The directions of these axes are of importance.
Topological Analysis of Variance and the Maxillary Complex
Persistent homology, a recent development in computational topology, has shown to be useful for analyzing high dimensional non-linear data. In this talk, we connect computational topology with the traditional analysis of variance and demonstrate this synergy on a three-dimensional orthodontic landmark data set derived from the maxillary complex. (Joint work with Jennifer Gamble and Peter Kim)
Bacterial trees in the Human Microbiome
Many studies are underway to describe the human microbiome, I will describe some of the methods used that combine phylogenetic trees and abundance data from high throughput sequencing and new microarray techniques.

In particular various particular metrics' have shown useful in coming to conclusions about explanatory clinical or contingent variables in such studies.

This talk contains joint work with PJ McMurdie, as well as David Relman's lab and Alfred Sporman at Stanford.
Panel Discussion
Afternoon Panel Discussion for May 23, 2012
Approaching the evolution of novelty: where biology needs math and statistics
The genetics and evolution of biological systems are extremely complex because of the large number of traits , and complex relationships among those traits. We use the form of fruit fly wings as a model to study the variational properties of complex biological structures. Variation is important because it controls evolutionary potential. Questions about evolutionary potential of high-dimensional entities raise a series of difficult mathematical and statistical problems.

Our data suggests that the dimensionality of the underlying system is very high. Could the data lie on a manifold embedded in the linear space of phenotypes? If so, phenomena that seem complex could have simple explanations. Manifold-finding based on genotypic data has not yet been attempted.
The pattern of variation in two different populations can be quite different. Can we identify the common phenotypic subspace, and, even more interesting, the subspaces where one has variation, and the other does not? Statistical approaches to those questions are not known (at least in biology)
How can we understand and predict the appearance of qualitatively novel phentoypes? Qualtitative novelty is one of the largest unsolved problems in biology. Is it possible to construct metrics for the ?novelty distance? between phenotypes that predict evolution? One possible kind of metric could combine geometry and topology as is done with persistent homology. Biology may offer different metrics based on the effects of mutation or common transitions during development.
Biologists need the expertise of mathematicians and statisticians to help us answer these important questions.
On Omitting and Hitting Properties for Means on Circles and Shape Spaces
The classical central limit theorem states that suitably translated and root n rescaled independent sample means tend to a multivariate Gaussian. Under certain, still rather restrictive conditions, it has been shown by Bhattacharya and Patrangenaru (2005) that the analog holds true on manifolds. One condition, namely uniqueness has been pushed to "data contained in a geodesic half ball" by Afsari (2011), which in particular encompasses "omitting a neighborhood of the cut locus" if non-void.

Determining asymptotics when the cut locus is not omitted proves to be challenging. For circles we present an exhaustive treatment of uniqueness and, in view of asymptotics, of the role of mass around the antipodal point.

Another issue turning up in shape spaces -- which may be manifolds with singularities -- is whether means omit these singularities and are stably assumed on the manifold part. We show that while intrinsic and Ziezold means are manifold stable, Procrustes means may hit singularities.

In consequence, e.g. for 3D shape analysis, given uniqueness, discrimination and classification based on the two-sample test is possible for intrinsic or Ziezold means. Procrustes means, however, may disqualify.

This talk is based on joint work with Thomas Hotz.
Riemannian barycentres: from harmonic maps and statistical shape to the classical central limit theorem
The subject of Riemannian barycentres has a strikingly long history, stretching back to work of Frechet and Cartan. The first part of this talk will be a review of the fundamental ideas and a discussion of the work of various probabilists and statisticians on applications of the concept to probabilistic approaches to harmonic map theory and statistical shape theory. I will then present some recent joint work with Huiling Le concerning central limit theory for empirical barycentres, which to our considerable surprise has led us to a new perspective on the classical Lindeberg-Feller central limit theorem.
The geometry and topology of projective shape space
Projective geometry underlies the way in which information about a 3d scene can be deduced from (one or more) 2d camera views. A key concept in projective geometry is that of a projective invariant for a configuration of collinear or coplanar points. The collection of information in the projective invariants can be termed the "projective shape" of a configuration. In this talk we use a spherical camera and adapt ideas from the Procrustes approach to similarity shape analysis to give a standardized representation for projective shapes. The resulting geometry faciliates metric comparisons between different projective shapes. The resulting topology leads to a clear understanding of the singularities in projective shape space.

Finally, the details behind the standardization lead to a distinction between four variants of projective shape space depending on the "type" of camera: oriented vs. non-oriented and directional vs. axial.
Some Recent Experience with Clostridium Difficile
C. difficile associated outbreaks have been reported worldwide, some with increased mortality and morbidity. Symptoms of this infectious disease range from mild diarrhea to severe colitis and even bowel perforation and death. The bacterium C. difficile is found with the normal bacteria comprising the intestinal flora. These can be killled by antibiotics but not the C. difficile spores, which are insensitive to the majority of antibiotics. The diagnosis of C. difficile infection is based on clinical signs and symptoms and a positive laboratory test for toxigenic C. difficile. Of particular concern is the NAP1/BI/027 strain which has affected North American hospitals of which Southern Ontario hospitals have been especially hard hit. In this talk we will discuss some recent experience with C. difficile along with the use of fecal biotherapy as an effective alternative to standard antibiotics. We will also go over some of the metagenomic sequencing results outlining the observed changes in intestinal flora pre and post fecal biotherapy.
Survey of stratified spaces
Stratified spaces arise in many contexts within mathematics. They are the natural class of topological spaces of "finite complexity". In many cases, they come endowed with canonical probability distributions on them.

This will be a survey talk with examples, such as spaces of configuration of points.
Object Oriented Data Analysis
Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.
Panel Discussion
Afternoon Panel Discussion for May 23, 2012
Sticky central limit theorems at singularities
Applications to areas such as biology, medicine, and image analysis require understanding the asymptotics of distributions on stratified spaces, such as tree spaces. In the surprisingly common circumstance when Frechet (intrinsic) means of distributions on stratified spaces lie on strata of low dimension, central limit theorems can exhibit non-classical "sticky" behavior: positive mass can be supported on thin subsets of the ambient space. This talk reports on investigations initiated by a Working Group at the Statistical and Applied Mathematical Sciences Institute (SAMSI) program on Analysis of Object Data, and continued jointly with Stephan Huckemann, Jonathan Mattingly, and Jim Nolen.
The Multiresolution Dantzig Selector: From Ion Channel recordings to Biomolecular Microscopy
In this talk we will introduce the multiscale Dantzig selector in the particular context of signal detection and imaging. This method allows to combine variational regularization methods with statistical multiscale techniques in a statistical sound manner. We address computational issues as well as asymptotic stochastic process theory of the multiscale statistics. The modeling of ion channel recordings and reconstruction in nanoscale biophotonic cell microscopy will be discussed in detail.
Statistics in Tree Space
The space of metric phylogenetic trees, as constructed by Billera, Holmes, and Vogtmann, is a polyhedral cone complex. This space is non-positively curved, which ensures there is a unique shortest path (geodesic) between any two trees, and that the mean and variance of a set or distribution of trees is well-defined. Furthermore, there is a polynomial time algorithm to compute geodesics, which leads to a practical algorithm for computing mean trees. I will present some applications of this mean and variance to some biological problems, such as constructing species trees from gene trees and understanding the effect of sequence length on tree reconstruction. This is joint work with Ezra Miller and Scott Provan.
Geometry and Statistics in the Eigen-structure of Symmetric (Positive Semi-Definite) Matrices
Symmetric positive semi-definite (PSD) matrices appear as data objects in the statistical analysis of Diffusion Tensor Imaging data, where there is interest in making inferences about the eigenvalues and eigenvectors of these objects. In this talk, I present a stratification of the set of symmetric PSD matrices of arbitrary dimension according to their eigenvalues, as well as maximum likelihood estimators (MLEs) and log-likelihood ratio (LLR) tests for the eigenvalues and eigenvectors of the mean matrix in a symmetric-matrix Gaussian model. The parameter sets involved are subsets of Euclidean space that are either affine subspaces, polyhedral convex cones, or orthogonally invariant embedded submanifolds. The asymptotic behavior of the MLEs and LLRs depend on the stratum where the true mean matrix lies.
Geometry and statistics of data
Not available

Object Oriented Data Analysis
J. S. Marron Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, h

The geometry and statistics of geometric trees
Aasa Feragen Anatomical tree-structures such as airway trees from lungs, blood vessels or dendrite trees in neurons, carry information about the organ that they are part of. Anatomical trees can be modeled as geometric trees, which are combinatorial trees whose e

On Omitting and Hitting Properties for Means on Circles and Shape Spaces
Stephan Huckemann The classical central limit theorem states that suitably translated and root n rescaled independent sample means tend to a multivariate Gaussian. Under certain, still rather restrictive conditions, it has been shown by Bhattacharya and Patrangenaru (

Medians, means and minimax centers in Riemannian geometry: existence, uniqueness, robustness and algorithms. Application to signal detection
Marc Arnaudon We give detailed results on the existence and uniqueness for medians, means and minimax centers of probability measures on Riemannian manifolds, including the case when the probability measure is supported in a regular geodesic ball and the case of g

Nonparametric Statistics on Manifolds- By Examples and Applications
Rabi Bhattacharya The general theory of nonparametric statistics on manifolds M presented here is of relatively recent origin. It builds much of its framework on the notion of the Fre'chet mean of a probability measure Q, namely, the point on the manifold which m

Sticky central limit theorems at singularities
Ezra Miller Applications to areas such as biology, medicine, and image analysis require understanding the asymptotics of distributions on stratified spaces, such as tree spaces. In the surprisingly common circumstance when Frechet (intrinsic) means of distributi

Bacterial trees in the Human Microbiome
Susan Holmes Many studies are underway to describe the human microbiome, I will describe some of the methods used that combine phylogenetic trees and abundance data from high throughput sequencing and new microarray techniques.

In particular variou

Phylogenetic networks and the real moduli space of curves
Satyan Devadoss Our story is motivated by the configuration space of particles on spheres. In the 1970s, Grothendieck, Deligne, and Mumford constructed a way to keep track of particle collisions in this space using Geometric Invariant Theory. In the 1990s, Gromov an

Towards statistical topology: homology, persistent homology and persistence landscapes
Peter Bubenik One of the principal uses of topology is to patch together local quantitative data to obtain global qualitative information not readily accessible to other methods. While the early development of topology was largely driven by applications, many late

Riemannian barycentres: from harmonic maps and statistical shape to the classical central limit theorem
Wilfrid Kendall The subject of Riemannian barycentres has a strikingly long history, stretching back to work of Frechet and Cartan. The first part of this talk will be a review of the fundamental ideas and a discussion of the work of various probabilists and statist

Some Recent Experience with Clostridium Difficile
Peter Kim C. difficile associated outbreaks have been reported worldwide, some with increased mortality and morbidity. Symptoms of this infectious disease range from mild diarrhea to severe colitis and even bowel perforation and death. The bacterium C. diffici

Approaching the evolution of novelty: where biology needs math and statistics
David Houle The genetics and evolution of biological systems are extremely complex because of the large number of traits , and complex relationships among those traits. We use the form of fruit fly wings as a model to study the variational properties of complex

Survey of stratified spaces
Robert MacPherson Stratified spaces arise in many contexts within mathematics. They are the natural class of topological spaces of "finite complexity". In many cases, they come endowed with canonical probability distributions on them.

This wil

Mean location, the two sample problem Harrie Hendriks, Mathematics, Radboud University Nijmegen
Harrie Hendriks The context will be the estimation of a parameter of a probability distribution, where the parameter lies in a differentiable manifold, more specifically in a submanifold of Euclidean space. The parameter could be a Frechet mean of a probability dist

Manifold-valued Tuning Parameters in Regularized Estimation of Multivariate Means
Rudolf Beran A multivariate k-way layout consists of observations with error on an array of vector-valued means, each of which is an unknown function of k real-valued covariates. Any decomposition of these vector means into a sum of orthogonal projections induces

Geometry and statistics of data
Hongtu Zhu Not available

Statistics in Tree Space
Megan Owen The space of metric phylogenetic trees, as constructed by Billera, Holmes, and Vogtmann, is a polyhedral cone complex. This space is non-positively curved, which ensures there is a unique shortest path (geodesic) between any two trees, and that the m

Geometry and Statistics in the Eigen-structure of Symmetric (Positive Semi-Definite) Matrices
Armin Schwartzman Symmetric positive semi-definite (PSD) matrices appear as data objects in the statistical analysis of Diffusion Tensor Imaging data, where there is interest in making inferences about the eigenvalues and eigenvectors of these objects. In this talk, I

The geometry and topology of projective shape space
John Kent Projective geometry underlies the way in which information about a 3d scene can be deduced from (one or more) 2d camera views. A key concept in projective geometry is that of a projective invariant for a configuration of collinear or coplanar points.

Topological Analysis of Variance and the Maxillary Complex
Giseon Heo Persistent homology, a recent development in computational topology, has shown to be useful for analyzing high dimensional non-linear data. In this talk, we connect computational topology with the traditional analysis of variance and demonstrate this

Videos

### Print

Full Schedule Participant List