### Organizers

Modern statistics problems, from areas such as evolutionary biology, medical imaging, and shape analysis, increasingly deal with data sampled from spaces that are singular but naturally stratified; that is, the spaces behave nicely at most points, but at certain points the smooth structure becomes degenerate, such as when the space is composed of two or more intersecting smooth pieces. Key examples of stratified spaces are shape spaces (representing equivalence classes of point configurations under operations such as rotation, translation, scaling, projective transformations, or other non-linear transformations) and tree spaces (representing metric phylogenetic trees on fixed sets of taxa). Generalizing these two examples leads to algebraic varieties and polyhedral complexes, respectively. Applications require knowledge of the asymptotics of distributions on such spaces. Developments in this "stratified statistics" take their cue from more classical geometric statistics, where data points are sampled from smooth manifolds, or from neighborhoods of embedded manifolds. Now, however, interesting algebraic geometry and combinatorics join the mix as methods for controlling behavior near strata of lower dimension, where the sample space can be singular nearby. Asymptotics on such spaces are governed not only by their local structure, but also by global topology (of the space and the data). Thus there has been increasing interest in the recently emerged method of statistical persistent homology. First results from the systematic study of nonparametric statistics on data sampled from stratified spaces include central limit theorems (CLTs) that illustrate nonclassical behavior, particularly when the mean lies on a lower stratum. The related asymptotics in this surprisingly common circumstance can depend in a crucial way on global geometry. Other first results include concrete combinatorial constructions of sample spaces. Interpretations for these results, particularly nonclassical CLTs, are immediately useful in specific applied problems from phylogenetics, brain imaging, and human binocular vision, but they raise fundamental pure mathematical questions relating curvature to asymptotics of probability distributions in non-smooth settings. Many of these investigations were initiated by a Working Group at the Statistical and Applied Mathematical Sciences Institute (SAMSI) program on analysis of object data. This MBI workshop aims to stimulate progress and cross-fertilization in the rapidly moving areas of theoretical and applied stratified statistics by gathering a mix of researchers with interests in biology, geometry, combinatorics, topology, probability, and statistics. The hope is to develop stratified methods to solve problems arising from investigations on existing biological and medical data sets, particularly those involving trees and more general shapes.

### Accepted Speakers

Monday, May 21, 2012 | |
---|---|

Time | Session |

09:00 AM 10:00 AM | Peter Kim - Some Recent Experience with Clostridium Difficile C. difficile associated outbreaks have been reported worldwide, some with increased mortality and morbidity. Symptoms of this infectious disease range from mild diarrhea to severe colitis and even bowel perforation and death. The bacterium C. difficile is found with the normal bacteria comprising the intestinal flora. These can be killled by antibiotics but not the C. difficile spores, which are insensitive to the majority of antibiotics. The diagnosis of C. difficile infection is based on clinical signs and symptoms and a positive laboratory test for toxigenic C. difficile. Of particular concern is the NAP1/BI/027 strain which has affected North American hospitals of which Southern Ontario hospitals have been especially hard hit. In this talk we will discuss some recent experience with C. difficile along with the use of fecal biotherapy as an effective alternative to standard antibiotics. We will also go over some of the metagenomic sequencing results outlining the observed changes in intestinal flora pre and post fecal biotherapy. |

10:30 AM 11:00 AM | Hongtu Zhu - Geometry and statistics of data Not available |

11:15 AM 12:15 PM | David Houle - Approaching the evolution of novelty: where biology needs math and statistics The genetics and evolution of biological systems are extremely complex because of the large number of traits , and complex relationships among those traits. We use the form of fruit fly wings as a model to study the variational properties of complex biological structures. Variation is important because it controls evolutionary potential. Questions about evolutionary potential of high-dimensional entities raise a series of difficult mathematical and statistical problems. Our data suggests that the dimensionality of the underlying system is very high. Could the data lie on a manifold embedded in the linear space of phenotypes? If so, phenomena that seem complex could have simple explanations. Manifold-finding based on genotypic data has not yet been attempted. The pattern of variation in two different populations can be quite different. Can we identify the common phenotypic subspace, and, even more interesting, the subspaces where one has variation, and the other does not? Statistical approaches to those questions are not known (at least in biology) How can we understand and predict the appearance of qualitatively novel phentoypes? Qualtitative novelty is one of the largest unsolved problems in biology. Is it possible to construct metrics for the ?novelty distance? between phenotypes that predict evolution? One possible kind of metric could combine geometry and topology as is done with persistent homology. Biology may offer different metrics based on the effects of mutation or common transitions during development. Biologists need the expertise of mathematicians and statisticians to help us answer these important questions. |

02:00 PM 03:00 PM | Marc Arnaudon - Medians, means and minimax centers in Riemannian geometry: existence, uniqueness, robustness and algorithms. Application to signal detection We give detailed results on the existence and uniqueness for medians, means and minimax centers of probability measures on Riemannian manifolds, including the case when the probability measure is supported in a regular geodesic ball and the case of generic data points in a complete manifold. Some properties of Fr'echet medians are also given, such as statistical consistency and quantitative explanation of robustness. In order to compute the Riemannian medians and means, we develop deterministic and stochastic gradient descent algorithms. We show the convergence of these algorithms in regular geodesic balls. The rate of convergence and error estimates of these algorithms are also obtained. For probability measures with support in compact manifolds, partial simulated annealing is used to obtain processes which converge to the means. Simulation examples of our algorithms are also shown, in the case of Toeplitz Hermitian positive definite matrices coming from covariance matrices of autoregressive processes. Applications to signal detection are given. |

Tuesday, May 22, 2012 | |
---|---|

Time | Session |

09:00 AM 10:00 AM | Peter Bubenik - Towards statistical topology: homology, persistent homology and persistence landscapes One of the principal uses of topology is to patch together local quantitative data to obtain global qualitative information not readily accessible to other methods. While the early development of topology was largely driven by applications, many later advances were motivated by strictly mathematical concerns. Now the field of applied topology is returning topology to its roots, adapting some of the later advances in topological methods to current questions in applications. I will survey some of the central constructions in topological data analysis, introducing homology and persistent homology. There is a clear need to combine these tools with statistical analysis. However there are difficulties in doing so, as the space of the usual topological descriptor is not a manifold. I define a new topological descriptor, the persistence landscape, whose definition allows for the calculation of means and standard deviations, laws of large numbers, central limit theorems and hypothesis testing. |

10:30 AM 11:00 AM | Megan Owen - Statistics in Tree Space The space of metric phylogenetic trees, as constructed by Billera, Holmes, and Vogtmann, is a polyhedral cone complex. This space is non-positively curved, which ensures there is a unique shortest path (geodesic) between any two trees, and that the mean and variance of a set or distribution of trees is well-defined. Furthermore, there is a polynomial time algorithm to compute geodesics, which leads to a practical algorithm for computing mean trees. I will present some applications of this mean and variance to some biological problems, such as constructing species trees from gene trees and understanding the effect of sequence length on tree reconstruction. This is joint work with Ezra Miller and Scott Provan. |

11:15 AM 12:15 PM | Robert MacPherson - Survey of stratified spaces Stratified spaces arise in many contexts within mathematics. They are the natural class of topological spaces of "finite complexity". In many cases, they come endowed with canonical probability distributions on them. This will be a survey talk with examples, such as spaces of configuration of points. |

02:00 PM 02:30 PM | John Kent - The geometry and topology of projective shape space Projective geometry underlies the way in which information about a 3d scene can be deduced from (one or more) 2d camera views. A key concept in projective geometry is that of a projective invariant for a configuration of collinear or coplanar points. The collection of information in the projective invariants can be termed the "projective shape" of a configuration. In this talk we use a spherical camera and adapt ideas from the Procrustes approach to similarity shape analysis to give a standardized representation for projective shapes. The resulting geometry faciliates metric comparisons between different projective shapes. The resulting topology leads to a clear understanding of the singularities in projective shape space. Finally, the details behind the standardization lead to a distinction between four variants of projective shape space depending on the "type" of camera: oriented vs. non-oriented and directional vs. axial. |

02:45 PM 03:15 PM | Satyan Devadoss - Phylogenetic networks and the real moduli space of curves Our story is motivated by the configuration space of particles on spheres. In the 1970s, Grothendieck, Deligne, and Mumford constructed a way to keep track of particle collisions in this space using Geometric Invariant Theory. In the 1990s, Gromov and Witten utilized them as invariants arising from string field theory and quantum cohomology. We consider the real points of these spaces, but now interpret them as spaces of rooted metric labeled trees. They have elegant geometric and combinatorial properties, being compact hyperbolic manifolds with a beautiful tessellation by convex polytopes. In recent years, they have gained importance in their own right, appearing in areas such as representation theory, geometric group theory, tropical geometry, and lately reinterpreted by Levy and Pachter as spaces of phylogenetic networks. In particular, these real moduli spaces resolve the singularities of the spaces of phylogenetic trees studied by Billera, Holmes, and Vogtmann. |

Wednesday, May 23, 2012 | |
---|---|

Time | Session |

09:00 AM 10:00 AM | Steve Marron - Object Oriented Data Analysis Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics. |

10:30 AM 11:00 AM | Ezra Miller - Sticky central limit theorems at singularities Applications to areas such as biology, medicine, and image analysis require understanding the asymptotics of distributions on stratified spaces, such as tree spaces. In the surprisingly common circumstance when Frechet (intrinsic) means of distributions on stratified spaces lie on strata of low dimension, central limit theorems can exhibit non-classical "sticky" behavior: positive mass can be supported on thin subsets of the ambient space. This talk reports on investigations initiated by a Working Group at the Statistical and Applied Mathematical Sciences Institute (SAMSI) program on Analysis of Object Data, and continued jointly with Stephan Huckemann, Jonathan Mattingly, and Jim Nolen. |

11:15 AM 12:15 PM | Rabi Bhattacharya - Nonparametric Statistics on Manifolds- By Examples and Applications The general theory of nonparametric statistics on manifolds M presented here is of relatively recent origin. It builds much of its framework on the notion of the Fre'chet mean of a probability measure Q, namely, the point on the manifold which minimizes the expected squared distance from a random variable with distribution Q. The nonparametric methods are intrinsic or extrinsic, depending on the distance used on M. The extrinsic distance is the distance induced from a good embedding of M in a Euclidean space, while the intrinsic distance is the geodesic distance on the manifold when endowed with a Riemannian structure. In examples, it is often the case that the nonparametric methods yield sharper inference than their parametric counterparts provide. Although we consider an application to paleomagnetism where M is the sphere S2, our main emphasis is on landmarks based shape spaces. The latter include (i) spaces of 2D and 3D images invariant under an appropriate group of transformations, which are useful in morphometrics and medical diagnostics, (ii) affine shape spaces invariant under affine transformations, useful in scene recognition based on satellite images, and (iii) projective shape spaces used in machine vision and robotics. We also consider 2D continuous images, and nonparametric estimation of shape densities. This talk is based on joint work with Vic Patrangenaru and Abhishek Bhattacharya. It is supported in part by the NSF grant DMS 1107053. |

03:00 PM 05:00 PM | Susan Holmes, Steve Marron - Panel Discussion Afternoon Panel Discussion for May 23, 2012 |

Thursday, May 24, 2012 | |
---|---|

Time | Session |

09:00 AM 10:00 AM | Stephan Huckemann - On Omitting and Hitting Properties for Means on Circles and Shape Spaces The classical central limit theorem states that suitably translated and root n rescaled independent sample means tend to a multivariate Gaussian. Under certain, still rather restrictive conditions, it has been shown by Bhattacharya and Patrangenaru (2005) that the analog holds true on manifolds. One condition, namely uniqueness has been pushed to "data contained in a geodesic half ball" by Afsari (2011), which in particular encompasses "omitting a neighborhood of the cut locus" if non-void. Determining asymptotics when the cut locus is not omitted proves to be challenging. For circles we present an exhaustive treatment of uniqueness and, in view of asymptotics, of the role of mass around the antipodal point. Another issue turning up in shape spaces -- which may be manifolds with singularities -- is whether means omit these singularities and are stably assumed on the manifold part. We show that while intrinsic and Ziezold means are manifold stable, Procrustes means may hit singularities. In consequence, e.g. for 3D shape analysis, given uniqueness, discrimination and classification based on the two-sample test is possible for intrinsic or Ziezold means. Procrustes means, however, may disqualify. This talk is based on joint work with Thomas Hotz. |

10:30 AM 11:00 AM | Armin Schwartzman - Geometry and Statistics in the Eigen-structure of Symmetric (Positive Semi-Definite) Matrices Symmetric positive semi-definite (PSD) matrices appear as data objects in the statistical analysis of Diffusion Tensor Imaging data, where there is interest in making inferences about the eigenvalues and eigenvectors of these objects. In this talk, I present a stratification of the set of symmetric PSD matrices of arbitrary dimension according to their eigenvalues, as well as maximum likelihood estimators (MLEs) and log-likelihood ratio (LLR) tests for the eigenvalues and eigenvectors of the mean matrix in a symmetric-matrix Gaussian model. The parameter sets involved are subsets of Euclidean space that are either affine subspaces, polyhedral convex cones, or orthogonally invariant embedded submanifolds. The asymptotic behavior of the MLEs and LLRs depend on the stratum where the true mean matrix lies. |

11:15 AM 12:15 PM | Rudolf Beran - Manifold-valued Tuning Parameters in Regularized Estimation of Multivariate Means A multivariate k-way layout consists of observations with error on an array of vector-valued means, each of which is an unknown function of k real-valued covariates. Any decomposition of these vector means into a sum of orthogonal projections induces least squares submodel fits that serve as candidate estimators of the mean vectors. MANOVA submodel fits, nested polynomial regression fits, or mixed combinations of both strategies illustrate classically. This talk describes penalized least squares estimators of the multivariate means in which the penalty terms are weighted through manifold-valued tuning parameters. Data-based selection of the tuning parameters yields estimators that dominate asymptotically those that arise from submodel fitting. In the special case of a complete balanced multivariate k-way layout, the proposed regularized estimators are linked to multiple Efron-Morris affine shrinkage. In unbalanced designs, the regularized estimators define a powerful generalization of affine shrinkage. |

02:00 PM 02:30 PM | Aasa Feragen - The geometry and statistics of geometric trees Anatomical tree-structures such as airway trees from lungs, blood vessels or dendrite trees in neurons, carry information about the organ that they are part of. Anatomical trees can be modeled as geometric trees, which are combinatorial trees whose edges are endowed with edge attributes describing their geometry. We consider edge attributes which take continuous scalar or vector values, leading to a continuum of trees rather than a discrete set of trees. We shall discuss different ways of building spaces of such geometric trees, all with the goal of obtaining a geodesic space of trees where statistical parameters can be computed with the help of geodesics. For geometric trees of any size, we can define a geodesic space of trees, but geodesic computations are NP complete and the space has nowhere bounded curvature, which means that many statistical tools are not readily available. By adding restrictions on size, admissible topologies, branch order and/or branch labeling, we can regularize the space in order to obtain spaces which have nicer properties in terms of computational complexity and statistical applications. We shall discuss the positive effect of these assumptions on the solvability of statistical problems along with their negative effect on the ability to model real anatomical trees. Finally, we shall present some recent results from experiments on airway trees from lung CT scans. |

02:45 PM 03:15 PM | Axel Munk - The Multiresolution Dantzig Selector: From Ion Channel recordings to Biomolecular Microscopy In this talk we will introduce the multiscale Dantzig selector in the particular context of signal detection and imaging. This method allows to combine variational regularization methods with statistical multiscale techniques in a statistical sound manner. We address computational issues as well as asymptotic stochastic process theory of the multiscale statistics. The modeling of ion channel recordings and reconstruction in nanoscale biophotonic cell microscopy will be discussed in detail. |

Friday, May 25, 2012 | |
---|---|

Time | Session |

09:00 AM 10:00 AM | Harrie Hendriks - Mean location, the two sample problem Harrie Hendriks, Mathematics, Radboud University Nijmegen The context will be the estimation of a parameter of a probability distribution, where the parameter lies in a differentiable manifold, more specifically in a submanifold of Euclidean space. The parameter could be a Frechet mean of a probability distribution on the submanifold itself, Frechet mean with respect to the Euclidean distance. We will give an account of the two-sample problem. This talk is based on joint work with Zinoviy Landsman. Examples from the literature will be indicated. Downs considered the QRS loop in vectorcardiograms, characterized by a pair of orthogonal unit vectors in 3-space. The space of such pairs is the Stiefel manifold V32, and can be considered as submanifold of 6-dimensional Euclidean space. A more involved example, considered by Rivest et al., is the human ankle joint that exhibits two independent rotation axes of the foot. The directions of these axes are of importance. |

10:30 AM 11:00 AM | Giseon Heo - Topological Analysis of Variance and the Maxillary Complex Persistent homology, a recent development in computational topology, has shown to be useful for analyzing high dimensional non-linear data. In this talk, we connect computational topology with the traditional analysis of variance and demonstrate this synergy on a three-dimensional orthodontic landmark data set derived from the maxillary complex. (Joint work with Jennifer Gamble and Peter Kim) |

11:15 AM 12:15 PM | Wilfrid Kendall - Riemannian barycentres: from harmonic maps and statistical shape to the classical central limit theorem The subject of Riemannian barycentres has a strikingly long history, stretching back to work of Frechet and Cartan. The first part of this talk will be a review of the fundamental ideas and a discussion of the work of various probabilists and statisticians on applications of the concept to probabilistic approaches to harmonic map theory and statistical shape theory. I will then present some recent joint work with Huiling Le concerning central limit theory for empirical barycentres, which to our considerable surprise has led us to a new perspective on the classical Lindeberg-Feller central limit theorem. |

Name | Affiliation | |
---|---|---|

Afsari, Bijan | bijan@cis.jhu.edu | Center for Imaging Science, Johns Hopkins University |

Arnaudon, Marc | marc.arnaudon@math.univ-poitiers.fr | Mathematics, |

Belkin, Mikhail | mbelkin@cse.ohio-state.edu | Department of Computer Science and Engineering, The Ohio State University |

Bendich, Paul | bendich@math.duke.edu | Mathematics, Duke University |

Beran, Rudolf | beran@wald.ucdavis.edu | Statistics, University of California, Davis |

Bhattacharya, Rabi | rabi@math.arizona.edu | Mathematics, University of Arizona |

Bubenik, Peter | p.bubenik@csuohio.edu | Mathematics, Cleveland State University |

Buibas, Marius | mbuibas@ucsd.edu | Physics, University of California, San Diego |

Devadoss, Satyan | satyan.devadoss@williams.edu | Mathematics, Williams College |

Dryden, Ian | dryden@mailbox.sc.edu | Statistics, University of South Carolina |

Ellingson, Leif | leif.ellingson@ttu.edu | Mathematics and Statistics, Texas Tech University |

Feragen, Aasa | aasa@diku.dk | Department of Computer Science, |

Forcey, Stefan | sf34@uakron.edu | Mathematics, University of Akron |

Groisser, David | groisser@ufl.edu | Mathematics, University of Florida |

Hallgrimsson, Benedikt | bhallgri@ucalgary.ca | Cell Biology & Anatomy, University of Calgary |

Hendriks, Harrie | Harrie.Hendriks@math.ru.nl | Applied Stochastics, |

Heo, Giseon | gheo@ualberta.ca | Mathematical and Statistical Sciences, University of Alberta |

Holmes, Susan | susan@stat.stanford.edu | Statistics, Stanford University |

Hotz, Thomas | hotz@math.uni-goettingen.de | Institute for Mathematical Stochastics, University of Goettingen |

Houle, David | dhoule@bio.fsu.edu | Biological Science, Florida State University |

Huber, Gregory | huber@uchc.edu | Mathematics, University of Connecticut |

Huckemann, Stephan | huckeman@math.uni-goettingen.de | Mathematical Stochastics, Georg-August University GÃ¶ttingen |

Kendall, Wilfrid | w.s.kendall@warwick.ac.uk | Statistics, University of Warwick |

Kent, John | J.T.Kent@leeds.ac.uk | Department of Statistics, University of Leeds |

Kim, Peter | pkim@uoguelph.ca | Mathematics and Statistics, University of Guelph |

Kubatko, Laura | Kubatko.2@osu.edu | Statistics/EEOB, The Ohio State University |

Le, Huiling | Huiling.Le@nottingham.ac.uk | Scool of Mathematical Sciences, University of Nottingham |

MacPherson, Robert | rdm@math.ias.edu | Dept. of Mathematics, Institute for Advanced Study |

Mao, Yi | maoyi0@gmail.com | Department of Microbiology, Boston University |

Marron, J. S. | marron@email.unc.edu | Statistics and O. R., University of North Carolina, Chapel Hill |

Memoli, Facundo | facundo.memoli@adelaide.edu.au | School of Computer Science, University of Adelaide |

Miakonkana, Guy-vanie | gmm0006@auburn.edu | Mathematics and Statistics, Auburn University |

Miller, Ezra | ezra@math.duke.edu | ..., SAMSI (Statistical and Applied Mathematical Sciences Institute) |

Mio, Washington | mio@math.fsu.edu | Mathematics, Florida State University |

Mudalige, Nishan | nishanm@mathstat.yorku.ca | Mathematics and Statistics, York University |

Munk, Axel | munk@math.uni-goettingen.de | Mathematics, |

Nye, Tom | Tom.Nye@ncl.ac.uk | School of Maths and Stats, |

Owen, Megan | mowen@fields.utoronto.ca | Computer Science, University of Waterloo |

Patrangenaru, Victor | vic@stat.fsu.edu | Statistics, Florida State University |

Pinder, Shaun | spinder@uoguelph.ca | Mathematics and Statistics, University of Guelph |

Provan, Scott | provan@email.unc.edu | Statistics and Operations Research, University of North Carolina, Chapel Hill |

Rush, Stephen | srush01@uoguelph.ca | Mathematics and Statistics, University of Guelph |

San Valentin, Gene Paul | gsanvale@math.fsu.edu | Mathematics, Florida State University |

Schwartzman, Armin | armins@hsph.harvard.edu | Biostatistics, Harvard University |

Shenfeld, Daniel | shenfeld@math.princeton.edu | Mathematics, Princeton University |

Sitharam, Meera | sitharam@cise.ufl.edu | computer and information science and engineering, University of Florida |

Skwerer, Sean | sskwerer@unc.edu | Statistics and Operations Research, University of North Carolina, Chapel Hill |

St. John, Katherine | stjohn@lehman.cuny.edu | Math & Computer Science, City University of New York (CUNY) |

Wang, Yusu | yusu@cse.ohio-state.edu | Computer Science and Engineering, The Ohio State University |

Wood, Andrew | andrew.wood@nottingham.ac.uk | School of Mathematical Sciences, University of Nottingham |

Zhu, Hongtu | htzhu@email.unc.edu | Biostatistics, University of North Carolina, Chapel Hill |

Zhu, Hongtu | hzhu@bios.unc.edu | Biostatistics and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill |

This talk is based on joint work with Vic Patrangenaru and Abhishek Bhattacharya. It is supported in part by the NSF grant DMS 1107053.

There is a clear need to combine these tools with statistical analysis. However there are difficulties in doing so, as the space of the usual topological descriptor is not a manifold. I define a new topological descriptor, the persistence landscape, whose definition allows for the calculation of means and standard deviations, laws of large numbers, central limit theorems and hypothesis testing.

We shall discuss different ways of building spaces of such geometric trees, all with the goal of obtaining a geodesic space of trees where statistical parameters can be computed with the help of geodesics. For geometric trees of any size, we can define a geodesic space of trees, but geodesic computations are NP complete and the space has nowhere bounded curvature, which means that many statistical tools are not readily available. By adding restrictions on size, admissible topologies, branch order and/or branch labeling, we can regularize the space in order to obtain spaces which have nicer properties in terms of computational complexity and statistical applications. We shall discuss the positive effect of these assumptions on the solvability of statistical problems along with their negative effect on the ability to model real anatomical trees. Finally, we shall present some recent results from experiments on airway trees from lung CT scans.

This talk is based on joint work with Zinoviy Landsman. Examples from the literature will be indicated. Downs considered the QRS loop in vectorcardiograms, characterized by a pair of orthogonal unit vectors in 3-space. The space of such pairs is the Stiefel manifold V32, and can be considered as submanifold of 6-dimensional Euclidean space. A more involved example, considered by Rivest et al., is the human ankle joint that exhibits two independent rotation axes of the foot. The directions of these axes are of importance.

In particular various `particular metrics' have shown useful in coming to conclusions about explanatory clinical or contingent variables in such studies.

This talk contains joint work with PJ McMurdie, as well as David Relman's lab and Alfred Sporman at Stanford.

Our data suggests that the dimensionality of the underlying system is very high. Could the data lie on a manifold embedded in the linear space of phenotypes? If so, phenomena that seem complex could have simple explanations. Manifold-finding based on genotypic data has not yet been attempted.

The pattern of variation in two different populations can be quite different. Can we identify the common phenotypic subspace, and, even more interesting, the subspaces where one has variation, and the other does not? Statistical approaches to those questions are not known (at least in biology)

How can we understand and predict the appearance of qualitatively novel phentoypes? Qualtitative novelty is one of the largest unsolved problems in biology. Is it possible to construct metrics for the ?novelty distance? between phenotypes that predict evolution? One possible kind of metric could combine geometry and topology as is done with persistent homology. Biology may offer different metrics based on the effects of mutation or common transitions during development.

Biologists need the expertise of mathematicians and statisticians to help us answer these important questions.

Determining asymptotics when the cut locus is not omitted proves to be challenging. For circles we present an exhaustive treatment of uniqueness and, in view of asymptotics, of the role of mass around the antipodal point.

Another issue turning up in shape spaces -- which may be manifolds with singularities -- is whether means omit these singularities and are stably assumed on the manifold part. We show that while intrinsic and Ziezold means are manifold stable, Procrustes means may hit singularities.

In consequence, e.g. for 3D shape analysis, given uniqueness, discrimination and classification based on the two-sample test is possible for intrinsic or Ziezold means. Procrustes means, however, may disqualify.

This talk is based on joint work with Thomas Hotz.

Finally, the details behind the standardization lead to a distinction between four variants of projective shape space depending on the "type" of camera: oriented vs. non-oriented and directional vs. axial.

This will be a survey talk with examples, such as spaces of configuration of points.

**Object Oriented Data Analysis**

J. S. Marron Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, h

**The geometry and statistics of geometric trees**

Aasa Feragen Anatomical tree-structures such as airway trees from lungs, blood vessels or dendrite trees in neurons, carry information about the organ that they are part of. Anatomical trees can be modeled as geometric trees, which are combinatorial trees whose e

**On Omitting and Hitting Properties for Means on Circles and Shape Spaces**

Stephan Huckemann The classical central limit theorem states that suitably translated and root n rescaled independent sample means tend to a multivariate Gaussian. Under certain, still rather restrictive conditions, it has been shown by Bhattacharya and Patrangenaru (

**Medians, means and minimax centers in Riemannian geometry: existence, uniqueness, robustness and algorithms. Application to signal detection**

Marc Arnaudon We give detailed results on the existence and uniqueness for medians, means and minimax centers of probability measures on Riemannian manifolds, including the case when the probability measure is supported in a regular geodesic ball and the case of g

**Nonparametric Statistics on Manifolds- By Examples and Applications**

Rabi Bhattacharya The general theory of nonparametric statistics on manifolds M presented here is of relatively recent origin. It builds much of its framework on the notion of the Fre'chet mean of a probability measure Q, namely, the point on the manifold which m

**Sticky central limit theorems at singularities**

Ezra Miller Applications to areas such as biology, medicine, and image analysis require understanding the asymptotics of distributions on stratified spaces, such as tree spaces. In the surprisingly common circumstance when Frechet (intrinsic) means of distributi

**Bacterial trees in the Human Microbiome**

Susan Holmes Many studies are underway to describe the human microbiome, I will describe some of the methods used that combine phylogenetic trees and abundance data from high throughput sequencing and new microarray techniques.

In particular variou

**Phylogenetic networks and the real moduli space of curves**

Satyan Devadoss Our story is motivated by the configuration space of particles on spheres. In the 1970s, Grothendieck, Deligne, and Mumford constructed a way to keep track of particle collisions in this space using Geometric Invariant Theory. In the 1990s, Gromov an

**Towards statistical topology: homology, persistent homology and persistence landscapes**

Peter Bubenik One of the principal uses of topology is to patch together local quantitative data to obtain global qualitative information not readily accessible to other methods. While the early development of topology was largely driven by applications, many late

**Riemannian barycentres: from harmonic maps and statistical shape to the classical central limit theorem**

Wilfrid Kendall The subject of Riemannian barycentres has a strikingly long history, stretching back to work of Frechet and Cartan. The first part of this talk will be a review of the fundamental ideas and a discussion of the work of various probabilists and statist

**Some Recent Experience with Clostridium Difficile**

Peter Kim C. difficile associated outbreaks have been reported worldwide, some with increased mortality and morbidity. Symptoms of this infectious disease range from mild diarrhea to severe colitis and even bowel perforation and death. The bacterium C. diffici

**Approaching the evolution of novelty: where biology needs math and statistics**

David Houle The genetics and evolution of biological systems are extremely complex because of the large number of traits , and complex relationships among those traits. We use the form of fruit fly wings as a model to study the variational properties of complex

**Survey of stratified spaces**

Robert MacPherson Stratified spaces arise in many contexts within mathematics. They are the natural class of topological spaces of "finite complexity". In many cases, they come endowed with canonical probability distributions on them.

This wil

**Mean location, the two sample problem Harrie Hendriks, Mathematics, Radboud University Nijmegen**

Harrie Hendriks The context will be the estimation of a parameter of a probability distribution, where the parameter lies in a differentiable manifold, more specifically in a submanifold of Euclidean space. The parameter could be a Frechet mean of a probability dist

**Manifold-valued Tuning Parameters in Regularized Estimation of Multivariate Means**

Rudolf Beran A multivariate k-way layout consists of observations with error on an array of vector-valued means, each of which is an unknown function of k real-valued covariates. Any decomposition of these vector means into a sum of orthogonal projections induces

**Geometry and statistics of data**

Hongtu Zhu Not available

**Statistics in Tree Space**

Megan Owen The space of metric phylogenetic trees, as constructed by Billera, Holmes, and Vogtmann, is a polyhedral cone complex. This space is non-positively curved, which ensures there is a unique shortest path (geodesic) between any two trees, and that the m

**Geometry and Statistics in the Eigen-structure of Symmetric (Positive Semi-Definite) Matrices**

Armin Schwartzman Symmetric positive semi-definite (PSD) matrices appear as data objects in the statistical analysis of Diffusion Tensor Imaging data, where there is interest in making inferences about the eigenvalues and eigenvectors of these objects. In this talk, I

**The geometry and topology of projective shape space**

John Kent Projective geometry underlies the way in which information about a 3d scene can be deduced from (one or more) 2d camera views. A key concept in projective geometry is that of a projective invariant for a configuration of collinear or coplanar points.

**Topological Analysis of Variance and the Maxillary Complex**

Giseon Heo Persistent homology, a recent development in computational topology, has shown to be useful for analyzing high dimensional non-linear data. In this talk, we connect computational topology with the traditional analysis of variance and demonstrate this