Workshop 4: Analysis and Visualization of Large Collections of Imaging Data

(April 21,2014 - April 25,2014 )

Organizers


Chandrajit Bajaj
computer science, University of Texas
Philipp Keller
Janelia Farm Research Campus, Howard Hughes Medical Institute
Mauro Maggioni
Mathematics, Duke University
Allen Tannenbaum
Computer Science and Applied Mathematics, Stony Brook University

This workshop focuses on the challenges presented by the analysis and visualization of large data sets that are collected in biomedical imaging, genomics and proteomics. The sheer size of data (easily in the range of terabytes, and growing) requires computationally efficient techniques for the sampling, representation, organization, and filtering of data; ideas and techniques from signal processing, geometric and topological analysis, stochastic dynamical systems, machine learning and statistical modeling are needed to extract patterns and characterize features of interest. Visualization enables interaction with data, algorithms, and outputs. Data sets from biomedical imaging, genomics and proteomics often have unique characteristics that differentiate them from other data sets, such as extremely high-dimensionality, high heterogeneity due to different data modalities (across different spatial and temporal scales, but also across different biological layers) that need to be fused, large stochastic components and noise, low sample size and possibly low reproducibility of per-patient data. These unique aspects, as well as the large size, pose challenges to many existing techniques aimed at solving the problems above. The workshop will bring together biologists, computer scientists, engineers, mathematicians and statisticians working in a wide of areas of expertise, with the goal of pushing existing techniques, and developing novel ones, for tackling the unique challenges offered by large data sets in biomedical imaging.

Accepted Speakers

Manfred Auer
Life Sciences Divison, Lawrence Berkeley National Lab
Zhirong Bao
Developmental Biology Program, Sloan-Kettering Institute
Andrea Bertozzi
Mathematics, UCLA
Rohit Bhargava
Bioengineering, University of Illinois at Urbana-Champaign
Kristin Branson
Ethomics, HHMI Janelia Farm Research Campus
Larry Carin
Dept. of Electrical and Computer Engineering, Duke University
Gunnar Carlsson
Department of Mathematics, Stanford University
Ronny Hadani
Mathematics, University of Texas at Austin
Chris Johnson
Scientific Computing and Imaging Institute, University of Utah
Yuehaw Khoo
Applied Math/Physics, Princeton University
Robert Marc
Ophthalmology, University of Utah School of Medicine
Dimitris Metaxas
Computer Science, Computational Biomedicine, Imaging, and Modeling Center
Remus Osan
Mathematics and Statistics, Georgia State University
Hanchuan Peng
Computational Neuroanatomy, Allen Institute for Brain Science
Angel Pineda
Mathematics, California State University, Fullerton
David Ress
Neuroscience, Baylor College of Medicine
Jason Swedlow
Centre for Gene regulation and Expression, University of Dundee
Carl-Fredrik Westin
Harvard Medical School,
Zeyun Yu
Computer Science, University of Wisconsin-Milwaukee
Jian-Zhou Zhang
College of Computer, Sichuan University
Monday, April 21, 2014
Time Session
08:00 AM

Shuttle to MBI

08:15 AM
08:45 AM

Breakfast

08:45 AM
09:00 AM

Welcome to MBI - Marty Golubitsky

Electron Microscopy and Associated Data Challenges
09:00 AM
09:45 AM
Manfred Auer - 2D and 3D Imaging of Entire Cells and Tissues at Macromolecular Resolution by Advanced Electron Microscopic Approaches

All processes defining and sustaining life take place within cells or between cells, often mediated by multiprotein supramolecular complexes also known as macromolecular machines . While Structural Genomics and recently Cryo-EM have yielded ever-increasing insight into the overall shape of such macromolecular machines and the detailed mechanism of e.g. catalysis, we lack insight how these machines are organized and function within cells. We must determine their 3D organization, subcellular location (e.g. with respect to ultrastructural landmarks) and their interaction with other proteins, the cytoskeleton and organelles and any changes of these characteristics during embryonic development, as part of their physiological function or during pathogenesis.


Using various biological examples, including inner ear hair cells and related tissues central to hearing, mammary gland development and breast cancer, as well microbial communities, I will illustrate the power of modern 2D and 3D electron microscopy imaging, including widefield montaging TEM, TEM tomography as well as Focused Ion Beam Scanning Electron Microscopy (FIB/SEM) and Serial Block Face (SBF/) SEM. The latter two techniques can yield macromolecular insight into the 3D organization of entire cells and tissues, and thus have the potential to truly revolutionize cell biology.


However, while it is now possible to obtain 10k by 10k by 10k voxel data sets (soon 32kx32kx32k), we do not possess the computational capabilities to deal with such terabytes of data, in terms of visualization, feature extraction/segmentation, annotation, and quantitative analysis. I will discuss the challenges these novel imaging approaches pose, describe how we currently deal with such data sets and discuss some emerging solutions that my lab has helped develop in combination with computer scientists at LBL and elsewhere.


We hope that this meeting can lead to further solutions in order to overcome this enormous bottle-neck that is emerging in structural cell biology imaging.


09:45 AM
10:30 AM
Robert Marc - Extracting large networks from connectomes

Mapping neural networks in brain, retina and spinal cord requires (1) comprehensive parts lists (vertex types), (2) nanometer scale connection detection (edge types), and (3) millimeter scale network tracing. Together this requires high resolution transmission electron microscope (TEM) imaging on a scale not routinely possible. Combining serial sectioning and TEM hardware control (SerialEM, Mastronarde, 2005, PMID 16182563) it is possible to created automated TEM (ATEM) imaging of mammalian retina composed of ‰ˆ0.4-1.4M high resolution images and assemble them into coherent 3D volumes of 16-21 TB of raw image data (Anderson et al., 2009, PMID 19855814). How should we build even larger connectomes? At present, we estimate that 100€”1000 TEM systems are underutilized globally, representing the most cost-effective scale-up path.



The next ask is navigating, exploring, segmenting and annotating the data space for characterization of network motifs. The key tool for this is the Viking system developed by Anderson et al. (2010 PMID 21118201, 2011 PMID 21311605). Viking allows web-compliant delivery of imagery and collection of markup. Our experience suggests that complex network analysis (rather than simple shape-building) is more effectively done by teams of trained annotators working under executive analysts rather than crowd-sourcing. At present, retinal connectome RC1 contains >825,000 discrete annotations tracking ‰ˆ 680 neural structures with 8300 connections, and ‰ˆ 300 glia. We estimate this is only ‰ˆ20% of the expected annotation set. However, efficiency accelerates 2-3 fold as the density of identified process collision increases. Further many connection sets are replicates



Lacking robust tools for automated tracking, the best strategy has been to use multichannel molecular / activity markers and cell classification to segment and prioritize tracking targets, followed by intensive manual annotation (Lauritzen et al., 2012 PMID 23042441). A scientifically credible yet computationally manageable data volume in the vertebrate nervous system is currently in the 10-50 TB range. But a persistent challenge has been the vast scale differences for different neurons in a volume. There are an estimated 60+ classes of cells in the mammalian retina. For example retinal connectome RC1 is a disk of neural retina 243 μm wide and 30 μm tall. It contains 104 copies of the single class of rod bipolar cells with axonal fields spanning 15-25 μm; 39 copies of one of its target cells the AII amacrine cell, whose dendrites span 50-60 μm; and a dozen copies of a larger target, the AI amacrine cell with dendrites spanning up to 1000 μm. But one of the main targets of the is the ganglion cell superclass, which can be segmented into 15 classes. RC1 contains a few whole copies of a few of these, and fragments of many more, but a volume containing a complete set would need to be > 1 mm in diameter and require over 250 TB. While this is computationally feasible, the acquisition time for one microscope is not practical, nor is routine small-team annotation. Clearly, scaling by (1) platform and (2) automated annotation is critical.



Finally, some key networks in the retina are massive structured hubs. The AII amacrine cells has partnerships with at least 30 cell classes, meaning that all retinal neurons are no more than 2 hops from an AII cell and most no more than 2 hops from any other cell. This means that network graphs for even small volumes are extremely dense and differential visualization is a major goal.


10:30 AM
11:15 AM

Break

11:15 AM
12:00 PM
Allen Tannenbaum
12:00 PM
02:00 PM

Lunch Break

Electron Microscopy and MRI Data Analysis
02:00 PM
02:45 PM
Ronny Hadani - Representation theoretic patterns in three dimensional cryo-electron microscopy

Three dimensional cryo-electron microscopy (3D cryo-EM, for short) is the problem of determining the three dimensional structure of a large molecule from the set of images, taken by an electron microscope, of randomly oriented and positioned identical molecular particles which are frozen in a thin layer of ice. A solution to this problem is of particular interest, since it promises to be an entirely general technique which does not require crystallization or other special preparation stages. Present approaches to the problem fail with particles that are too small, cryo-EM images that are too noisy or at resolutions where the signal-to-noise ratio becomes too small.

The focus of my talk is the intrinsic reconstitution algorithm, due to Singer and Shkolnisky, which constitutes a basic step for the solution of the 3D cryo-EM problem and whose main appealing property is its remarkable numerical stability to noise. My goal is to give an introductory explanation of the mathematical principles underlying this novel algorithmic approach, while hinting about how they apply to other fundamental problems in cryo-EM and beyond. Along the way, I will describe the mathematical model underlying the experimental set-up, specifying the main computational problems/technical difficulties that should be resolved as part of three dimensional structure determination from cryo-EM images.

Finally, to put things in a broader mathematical perspective, I will briefly mention the general picture: explaining how the intrinsic reconstitution algorithm can be recasted in the (yet to be fully developed) framework of categorical optimization, which is a novel paradigm for solving certain types of non-linear optimization problems by characterizing the solution as an object of a category instead of as an element of a set.

This work is part of a collaborative project conducted with Amit Singer (Princeton), Shamgar Gurevich (Wisconsin Madison), Yoel Shkolnisky (Tel-Aviv University) and Fred Sigworth (Yale).

02:45 PM
03:15 PM

Break

03:15 PM
03:45 PM
David Ress - Simple signed-distance approach to the measurement of depth in human cerebral cortex

High-resolution functional MRI methods have the potential to resolve depth variations in laminated brain tissues such as human cerebral cortex. However, it is challenging to create a geometrically logical definition of depth in this highly convoluted topology. Within the gray matter, which is bounded by the gray-white and pial surfaces, a nearest-neighbor Euclidean distance definition from the two surfaces is not satisfactory because of inconsistency between the distances defined from the two surfaces, and a failure to properly follow the dissimilar topologies of the surfaces. A method based on solution of Laplace's equation between the two surfaces has been used, but the usual finite-difference solution methods can suffer from artifacts when the surfaces are discretized onto a grid, particularly in the depths of narrow sulci. Here we propose an alternative approach based on an interpolated signed-distance function that makes direct use of the smooth surface representations. We compute a signed distance function, S(x,y,z), defined in two simple ways that are based on Euclidean distance metrics, with the sign determined by the brain volume tissue segmentation. Signed distance is calculated separately for the gray-white and pial surfaces, Sw and Sp, respectively. We then form a weighted-distance function, D(w? x,y,z) = wSw + (1− w)Sp. Variation of the weighting parameter, w ∈ [0,1], defines a smooth transition between the two surfaces. In our application, we solve D(w? x,y,z) = 0 to obtain w at every grid point within the gray matter domain. We treat w is a pseudo-potential that smoothly and logically interpolates between the two surfaces, thus defining a normalized depth coordinate within the gray matter. To obtain physical distances and gray-matter thickness, we calculate ∇w and trace it from every point on the gray-white surface to the pial surface. The method was applied to four MRI brain anatomies, and yielded visually and quantitatively reasonable measurements of gray matter depth and thickness.

04:00 PM
06:00 PM

Reception & Poster Session

06:00 PM

Shuttle pick-up from MBI

Tuesday, April 22, 2014
Time Session
08:00 AM

Shuttle to MBI

08:15 AM
09:00 AM

Breakfast

MRI and image-based modeling
09:00 AM
09:45 AM
Larry Carin - Imaging the Brain with Heterogeneous Data Sources

A new model is developed for joint analysis of ordered, categorical, real and count data, motivated by brain imaging and human behavior analysis. In the motivating application, the ordered and categorical data are answers to questionnaires, the (word) count data correspond to the text questions from the questionnaires, and the real data correspond to fMRI responses for each subject. We also combine the analysis of these data with single-nucleotide polymorphism (SNP) data from each individual. The questionnaires considered here correspond to standard psychological surveys, and the study is motivated by psychology and neuroscience. The proposed Bayesian model infers sparse graphical models (networks) jointly across people, questions, fMRI stimuli and brain activity, integrated within a new matrix factorization based on latent binary features. We demonstrate how the learned model may take fMRI and SNP data from a subject as inputs, and predict (impute) how the individual would answer a psychological questionnaire; going in the other direction, we also use an individual's SNP data and answers from questionnaires to impute unobserved fMRI data. Each of these two imputation settings have practical and theoretical applications for understanding human behavior and mental health, which are discussed.

09:45 AM
10:30 AM
Carl-Fredrik Westin - Brain microstructure from next generation diffusion MRI

We are at the cusp of a completely new generation of diffusion MRI (dMRI) technologies. New methods are transforming what can be measured, and have the potential to vastly improve tissue characterization using diffusion MRI. In diffusion MRI, each millimeter-size voxel of the image contains information on the micrometer-scale translational displacements of the water. The vast majority of applications today focus on the simplest form of the original MRI diffusion experiment, the Stejskal-Tanner pulse sequence. This sequence is based on a pair of short pulsed diffusion encoding gradients, which we will refer to as the single pulsed field gradient (sPFG) experiment. sPFG is used in diffusion tensor imaging (DTI), enabling popular measures such as the mean diffusion (apparent diffusion coefficient, ADC) and diffusion anisotropy (Fractional Anisotropy, FA). sPFG is also the technology underlying higher order diffusion imaging (HARDI), and diffusion spectrum imaging (DSI). Although current popular diffusion measures are very sensitive to changes in the cellular architecture, they are not very specific regarding the type of change. In contrast, the new generation of dMRI technologies such as oscillating gradients, double/multi pulsed-field gradient sequences, and more general waveform sequences, has the potential to provide significant new sensitivity for imaging of the brain€™s cellular microstructure. I will present recent work on general waveform dMRI.

10:30 AM
11:15 AM

Break

11:15 AM
12:00 PM
Chris Johnson - Image-Based Biomedical Modeling, Simulation and Visualization

Increasingly, biomedical researchers need to build functional computer models from images (MRI, CT, EM, etc.). The "pipeline" for building such computer models includes image analysis (segmentation, registration, filtering), geometric modeling (surface and volume mesh generation), large-scale simulation (parallel computing, GPUs), large-scale visualization and evaluation (uncertainty, error). In my presentation, I will present research challenges and software tools for image-based biomedical modeling, simulation and visualization and discuss their application for solving important research and clinical problems in neuroscience, cardiology, and genetics.

12:00 PM
02:00 PM

Lunch Break

Segmentation, tracking and analysis of time-lapse data
02:00 PM
02:45 PM
Kristin Branson - Mapping behavior to neural anatomy using machine vision and thermogenetics

To understand the relationship between neural anatomy and behavior, the ultimate output of the nervous system, we performed a high-throughput, thermogenetic screen of 2,200 transgenic lines of Drosophila from the Janelia GAL4 collection. Each GAL4 line drives expression in a different, sparse subset of neurons in the fly nervous system. Using UAS dTrpa1, we selectively activated these sparse subsets of neurons, and measured the behavioral effects. For this screen, we developed a complete, high-throughput, automated system for measuring the locomotion and social behavior of flies with both breadth and depth. We recorded 20,000 videos of groups of flies freely behaving in an open-field walking arena, totaling about 400 TB of raw data. From the video, we tracked the flies' body positions and wings using our tracking software, Ctrax. We used our machine learning-based behavior classification system, JAABA (The Janelia Automatic Animal Behavior Annotator), to create 15 behavior classifiers (e.g. walking, grooming, chasing) that input the trajectories created by Ctrax and output predictions, for each frame and each fly, of the flies' behaviors (totaling ~175 billion annotations of fly behavior). For each line of flies, we compute a set of ~800 behavior statistics, such as the fraction of time spent chasing, or the average speed while walking, summarizing the behavioral effects of activating the targeted neurons in a concise yet interpretable manner. Concurrent with our screen, the Janelia Fly Light project has imaged the expression pattern of each GAL4 line, producing image stacks indicating which neurons are likely being activated in each line. By mining our behavior data set in conjunction with the Fly Light imagery data, we have identified novel sets of neurons potentially involved in jumping, female chasing, and wing flicking behavior.

02:45 PM
03:30 PM
Mauro Maggioni - Geometric Multiscale methods and models in high dimensions

We will discuss the problem of dictionary learning for images and a novel approach to the construction of multi-resolution dictionaries for data and images, that has several advantages over the current state of art, including fast algorithms for the construction of the dictionary, the calculation of the coefficients onto the dictionary, and guarantees on the performance of such dictionary learning algorithm. We discuss applications to the analysis of large data sets, including collections of images, and representations/visualizations thereof.

03:30 PM
04:15 PM

Break

04:15 PM
05:00 PM
Philipp Keller - Reconstructing the nervous system Development and function with light sheet microscopy

In the embryonic development of vertebrates and higher invertebrates, a single cell is transformed into a fully functional organism comprising on the order of tens of thousands of cells or more. In a complex process of self-organization, these cells rapidly divide, migrate, differentiate and form tissues and organs able to perform the most challenging tasks. The nervous system is a key component of the developmental building plan that stands out in terms of size, complexity and function. However, very little is known about the developmental dynamics of this complex system, since the technology to comprehensively record and computationally analyze in vivo cell behavior in neural tissues has been lacking. The overall objective of our research is to gain such quantitative experimental access, to determine the fundamental rules governing neural development, and to systematically link development to the functional activation of circuits in the nervous system.



I will present our experimental approach based on light-sheet fluorescence microscopy, an emerging imaging technology that achieves exceptionally high imaging speed and signal-to-noise ratio, while minimizing light exposure of the specimen. This unique combination of capabilities makes light-sheet microscopes indispensable for the long-term in vivo imaging of entire developing organisms. We are designing advanced implementations of scanned light-sheet fluorescence microscopy, such as our SiMView technology framework for simultaneous multiview imaging [1], to systematically image the early development of entire fruit fly, zebrafish and mouse embryos with cellular resolution. I will furthermore present strategies for automated large-scale image processing of these multi-terabyte light-sheet microscopy data sets. This combined experimental and computational approach allows us to perform whole-organism functional imaging [2] and to quantitatively analyze developmental lineages and their interrelationships in the entire animal [3]. Our goal is to take advantage of these high-resolution data to attain a system-level understanding of cell fate decisions and the establishment of structural connectivity, and how the underlying mechanisms give rise to the dynamic architecture of neural tissues. In the long-term perspective, we will use this information to establish and validate a computer model of the developing nervous system [4].



I envision that our quantitative approach to the reconstruction of large neuronal system dynamics will provide critical insights into the properties of complex circuits and complement ongoing large-scale electron microscopy analyses of static neuronal network architecture.



[1] Tomer et al., Nature Methods, 9:755-63 (2012)


[2] Ahrens et al., Nature Methods, 10:413-20 (2013)


[3] Amat and Keller, Development, Growth and Differentiation, 55:563-78 (2013)


[4] Keller, Science, 340:1234168 (2013)


05:00 PM
05:45 PM
Andrea Bertozzi - Geometric graph-based methods for segmentation of hyperspectral imagery

We present a new algorithm for segmentation of large datasets with graph based structure. The method combines ideas from classical PDE-based image segmentation with fast and accessible linear algebra methods for computing information about the spectrum of the graph Laplacian. I will present results for image processing applications such as image labeling and hyperspectral video segmentation.

05:45 PM

Shuttle pick-up from MBI

Wednesday, April 23, 2014
Time Session
08:00 AM

Shuttle to MBI

08:15 AM
09:00 AM

Breakfast

Light microscopy and associated data challenges
09:00 AM
09:45 AM
Zhirong Bao - The making of the worm: every cell, every minute

The nematode C. elegans is a major model organism for biomedical research. Its size and transparency make in toto imaging a powerful approach to tackle its biology and achieve synergy with genetics and systems biology. We study C. elegans embryogenesis by combining these approaches. I will discuss our efforts on the following fronts: (1) optical microscopy for long-term live imaging through embryogenesis at single-cell resolution; (2) image analysis for automated cell lineage tracing; (3) genome-wide analysis of the phenotypic landscape and mechanistic models of development; and (4) construction of an interactive 4D atlas of neural development for the entire nervous system.

09:45 AM
10:30 AM
Gunnar Carlsson - Topological Representation of Large and Complex Data Sets

We will discuss some methods of compressed representations of complex data sets using topological networks. The representations frequently preserve features of the data sets, and are extremely useful in locating subpopulations which are difficult to detect using other methods.

The methods act as a useful complement to existing machine learning methods.

10:30 AM
11:15 AM

Break

11:15 AM
12:00 PM
Rohit Bhargava - Data-induced Challenges in Developing Chemical Imaging for Pathology

Chemical imaging is an emerging technology in which light is employed to record the molecular content of samples. Instead of using dyes or stains, the approach uses computation to visualize the molecular content. First, the technology is introduced and progress in instrumentation reviewed. The importance of simulation and of data analysis will next be emphasized and examples provided. Big data strategies and computational needs will be discussed next. Finally, a computational strategy and statistical considerations underlying decision-making are described. Our laboratory focuses, among other topics, on the analysis of biological materials for histopathology. In demonstrative applications, we describe attempts to diagnose and grade cancer in breast and prostate biopsies without human input. Results indicate that a rapid assessment of lesions is possible with high accuracy and their lethality may be predicted using a systems approach to pathology, which is critically enabled by rapid data analysis strategies. The volume and mining of this unique data set involve several challenges. These are described with potential solutions and impacts of novel strategies to address these challenges.

12:00 PM
02:00 PM

Lunch Break

02:00 PM
02:45 PM
Hanchuan Peng - High-throughput Bioimage Informatics for Neuroscience and Cell Biology: Worm, Fly, Mouse, and Human

In recent years, high-throughput phenotype screening that involves systematic analysis of microscopic images (thus called "Bioimage Informatics") and other types of data has become more and more prevailing and promising. Here I will discuss several examples of widely used model systems, including C. elegans, fruitfly, mouse, and human. I will also discuss our high-performance image visualization and computing platform, Vaa3D (http://vaa3d.org), which has been used in several challenging high-throughput bioimage informatics applications, and my recent work on fast 3D microscopic smart-imaging system for neuroscience studies.

Data management and visualization
02:45 PM
03:30 PM
Jason Swedlow - The Open Microscopy Environment: Open Source Image Informatics for the Biological Sciences

Despite significant advances in cell and tissue imaging instrumentation and analysis algorithms, major informatics challenges remain unsolved: file formats are proprietary, facilities to store, analyze and query numerical data or analysis results are not routinely available, integration of new algorithms into proprietary packages is difficult at best, and standards for sharing image data and results are lacking. We have developed an open-source software framework to address these limitations called the Open Microscopy Environment (http://openmicroscopy.org). OME has three components—an open data model for biological imaging, standardised file formats and software libraries for data file conversion and software tools for image data management and analysis.

http://openmicroscopy.org/site/support/ome-model/) provides a common specification for scientific image data and has recently been updated to more fully support fluorescence filter sets, the requirement for unique identifiers, screening experiments using multi-well plates.

The OME-TIFF file format (http://openmicroscopy.org/site/support/ome-model/ome-tiff) and the Bio-Formats file format library (http://openmicroscopy.org/site/products/bio-formats) provide an easy-to-use set of tools for converting data from proprietary file formats. These resources enable access to data by different processing and visualization applications, sharing of data between scientific collaborators and interoperability in third party tools like Fiji/ImageJ.

The Java-based OMERO platform (http://openmicroscopy.org/site/products/omero) includes server and client applications that combine an image metadata database, a binary image data repository and visualization and analysis by remote access. The current stable release of OMERO (OMERO-4.4; http://openmicroscopy.org/site/support/omero4/downloads) includes a single mechanism for accessing image data of all types-- regardless of original file format-- via Java, C/C++ and Python and a variety of applications and environments (e.g., ImageJ, Matlab and CellProfiler). This version of OMERO includes a number of new functions, including SSL-based secure access, distributed compute facility, filesystem access for OMERO clients, and a scripting facility for image processing. An open script repository allows users to share scripts with one another. A permissions system controls access to data within OMERO and enables sharing of data with users in a specific group or even publishing of image data to the worldwide community. Several applications that use OMERO are now released by the OME Consortium, including a FLIM analysis module, an object tracking module, two image-based search applications, an automatic image tagging application, and the first release of a biobanking application (http://www.openmicroscopy.org/site/products/partner). Our next version, OMERO-5 (http://openmicroscopy.org/site/products/ome5; currently available as a Release Candidate) includes updates and resources to specifically support large datasets that appear in digital pathology and high content screening. Importing these large datasets is fast, and data are stored in their original file format, so they can be accessed by 3rd party software.

OMERO and Bio-Formats run the JCB DataViewer (http://jcb-dataviewer.rupress.org/), the world’s first on-line scientific image publishing system and are used to publish 3D EM tomograms in the EMDataBank (http://emdatabank.org/). They also power several large institutional image data repositories (e.g., http://odr.stowers.org and http://lincs.hms.harvard.edu/).

03:30 PM
04:15 PM

Break

04:15 PM
05:00 PM
Zeyun Yu - Adaptive Mesh Representation and Processing of Biomedical Images

Biomedical imaging technologies are now widely used in many aspects of science and engineering. Images acquired or computed are commonly digitized into two- or three-dimensional (2D or 3D) regular arrays composed of pixels or voxels respectively. Despite its ease of use and processing on computers, this type of image representation often contains a lot of information redundancy, which poses great challenges to data storage and transmission especially with rapidly increasing resolutions of digital images and growing availabilities of volumetric images. In addition, many pixel-based image processing and analysis algorithms require at least linear time complexity with respect to the number of pixels in an image. Increasing image sizes can become a bottleneck in many real-time applications such as patient diagnosis and remote healthcare systems. For these reasons, finding other image representations that allow less storage space and faster processing speed would be important.

In this talk, I shall present an approach to representing images with irregular meshes. Compared to the traditional pixel-based method, the new representation provides a significantly more compact way to describe an image, which is well suited to storing and transferring large imaging data. The mesh structure of an image is adaptively defined with finer elements near image features. A method will also be presented to restore the pixel-based image with an arbitrary resolution from the mesh representation.

05:00 PM

Shuttle pick-up from MBI

Thursday, April 24, 2014
Time Session
08:00 AM

Shuttle to MBI

08:15 AM
09:00 AM

Breakfast

Analysis of Large Image Data Collections, Part I
09:00 AM
09:45 AM
Chandrajit Bajaj - Computational Topology, Geometry and Analysis for Quantitative Relationships between Biological Form and Function from 3D Electron Microscopy

My two part talk shall focus on a set of computational topology, geometry and analysis techniques necessary to establish quantitative relationships between biological form and function, especially where structural models of molecules and cells are elucidated from 3D Electron Microscopy. First, we shall process cryo electron micrograph image collections of single particle and electron tomography, construct topologically curated and spatially realistic dimensional (3D) ultra-structure models of large bio-molecular complexes (LBCs) , towards deriving accurate biophysical properties of the multiple interfaces of LBCs. In the second part I shall again apply topology, geometry, and analysis techniques to establish meaningful relationships between neuronal form and function at the synaptic interaction level. Here our focus shall be to process serial section transmission electron micrograph image collections (ssTEM and STEM in-SEM), construct topologically curated models of local neuronal circuits, complete with the complex dendritic arbor of pyramidal neurons, to derive multi-scale electrophysical properties.

09:45 AM
12:00 PM

DIscussion

12:00 PM
02:00 PM

Lunch Break

Analysis of Large Image Data Collections, Part II
02:00 PM
02:25 PM
Yuehaw Khoo - Solving NMR Distance Geometry Problem by Global Registration

The distance geometry problem in NMR structural calculation consists of estimating the coordinates of atoms from imprecise measurements of a subset of their pair-wise distances. This talk will focus on recent divide-and-conquer approaches that solve the problem in two steps: In the first step, the atoms are partitioned into smaller subsets and the structure for each subset is found separately, whereas in the second step a global molecular structure is obtained by stitching together all the local predetermined substructures. Results of numerical simulations demonstrate the advantages of this approach in terms of accuracy and running time. This is a joint work with Kunal Chaudhury and Amit Singer.

02:25 PM
02:50 PM
Jian-Zhou Zhang - Two-Stage Method for Salt-and-Pepper Noise Removal Using Statistical Jump Regression Analysis

Two-Stage Method for Salt-and-Pepper Noise Removal Using Statistical Jump Regression Analysis

02:50 PM
03:15 PM
Angel Pineda - Task-Based Information Content of Medical Images

Medical images are typically obtained with a clinical task in mind. These tasks can often be modeled as signal detection or parameter estimation. In this talk, we will define the information content of medical images based on the performance of mathematical models for clinical tasks using those images. To quantify this type of information content we need to define the task (the intended use of the images), the statistics (the sources of variability in the data), and the observer (how we intend to obtain the information from the images). This task-based optimization can be used in a wide variety of settings. This talk will present results regarding how higher resolution projections can be used to improve the noise properties of CT reconstructions. This measure of information content could also be used to evaluate and optimize methods to reduce the size of large data sets.

03:15 PM
04:00 PM

Break

04:00 PM
04:45 PM
Remus Osan - Classification and visualization of neural patterns using subspace analysis statistical methods

With new developments in experimental recording technologies, neuroscientists are capable of record large and complex neural data and also shift their study from single-unit level into multi-units level. As a result, methods for handling big data are needed in order to investigate temporal and spatial patterns and also to enable the scientists to have an intuitive understanding of the complex data. In this study, we used eigenvalue/eigenvector methods, such as Principal Component Analysis (PCA) and Multiple Discriminant Analysis (MDA) for representing the large-scale data in lower dimension and for pattern classification. We apply these methods to data from large-scale multi-electrode recordings in the hippocampus and from optical imaging data from the olfactory receptor neurons (ORNs). Altogether, application of these subspace analysis techniques to data containing experimental recordings from neurons of different animals and brain regions shows their usefulness and potential for uncovering neural codes hidden in complex and dynamics population patterns.

04:45 PM

Shuttle pick-up from MBI

05:00 PM
06:00 PM

Cash Bar

06:00 PM
08:00 PM

Banquet in the Fusion Room at Crowne Plaza

Friday, April 25, 2014
Time Session
Name Email Affiliation
Adeniyi, Aruna aruna.adeniyi@uniosun.edu.ng Department of Mathematical and Physical Sciences, Osun State University, Osogbo, Nigeria
Auer, Manfred MAuer@lbl.gov Life Sciences Divison, Lawrence Berkeley National Lab
Bajaj, Chandrajit bajaj@cs.utexas.edu computer science, University of Texas
Bao, Zhirong baoz@mskcc.org Developmental Biology Program, Sloan-Kettering Institute
Bertozzi, Andrea bertozzi@math.ucla.edu Mathematics, UCLA
Bhargava, Rohit rxb@illinois.edu Bioengineering, University of Illinois at Urbana-Champaign
Branson, Kristin bransonk@janelia.hhmi.org Ethomics, HHMI Janelia Farm Research Campus
Carin, Larry lcarin@ece.duke.edu Dept. of Electrical and Computer Engineering, Duke University
Carlsson, Gunnar Gunnar@math.stanford.edu Department of Mathematics, Stanford University
Costanzo, Francesco costanzo@engr.psu.edu Engineering Science and Mechanics, Pennsylvania State University
Driscoll, Tobin driscoll@udel.edu Mathematical Sciences, University of Delaware
Fahroo, Fariba fariba.fahroo@gmail.com Control and Dynamical systems (RTA), Air Force office of Scientific Research
Guan, Bo guan@math.osu.edu Mathematics, Ohio State University
Hadani, Ronny hadani@math.utexas.edu Mathematics, University of Texas at Austin
Johnson, Chris crj@sci.utah.edu Scientific Computing and Imaging Institute, University of Utah
Keller, Philipp kellerp@janelia.hhmi.org Janelia Farm Research Campus, Howard Hughes Medical Institute
Khoo, Yuehaw ykhoo@princeton.edu Applied Math/Physics, Princeton University
Maggioni, Mauro mauro@math.duke.edu Mathematics, Duke University
Marc, Robert robert.marc@hsc.utah.edu Ophthalmology, University of Utah School of Medicine
Metaxas, Dimitris dnm@cs.rutgers.edu Computer Science, Computational Biomedicine, Imaging, and Modeling Center
Osan, Remus rosan@gsu.edu Mathematics and Statistics, Georgia State University
Peng, Hanchuan pengh@janelia.hhmi.org Computational Neuroanatomy, Allen Institute for Brain Science
Pineda, Angel apineda@fullerton.edu Mathematics, California State University, Fullerton
Ress, David ress@bcm.edu Neuroscience, Baylor College of Medicine
Swedlow, Jason jason@lifesci.dundee.ac.uk Centre for Gene regulation and Expression, University of Dundee
Tannenbaum, Allen arobertan@gmail.com Computer Science and Applied Mathematics, Stony Brook University
Westin, Carl-Fredrik westin@bwh.harvard.edu Harvard Medical School,
Xia, Jun jxia1@gsu.edu Mathematics and Statistics, Georgia State University
Yu, Zeyun yuz@uwm.edu Computer Science, University of Wisconsin-Milwaukee
Zhang, Jian-Zhou zhangjz@scu.edu.cn College of Computer, Sichuan University
2D and 3D Imaging of Entire Cells and Tissues at Macromolecular Resolution by Advanced Electron Microscopic Approaches

All processes defining and sustaining life take place within cells or between cells, often mediated by multiprotein supramolecular complexes also known as macromolecular machines . While Structural Genomics and recently Cryo-EM have yielded ever-increasing insight into the overall shape of such macromolecular machines and the detailed mechanism of e.g. catalysis, we lack insight how these machines are organized and function within cells. We must determine their 3D organization, subcellular location (e.g. with respect to ultrastructural landmarks) and their interaction with other proteins, the cytoskeleton and organelles and any changes of these characteristics during embryonic development, as part of their physiological function or during pathogenesis.


Using various biological examples, including inner ear hair cells and related tissues central to hearing, mammary gland development and breast cancer, as well microbial communities, I will illustrate the power of modern 2D and 3D electron microscopy imaging, including widefield montaging TEM, TEM tomography as well as Focused Ion Beam Scanning Electron Microscopy (FIB/SEM) and Serial Block Face (SBF/) SEM. The latter two techniques can yield macromolecular insight into the 3D organization of entire cells and tissues, and thus have the potential to truly revolutionize cell biology.


However, while it is now possible to obtain 10k by 10k by 10k voxel data sets (soon 32kx32kx32k), we do not possess the computational capabilities to deal with such terabytes of data, in terms of visualization, feature extraction/segmentation, annotation, and quantitative analysis. I will discuss the challenges these novel imaging approaches pose, describe how we currently deal with such data sets and discuss some emerging solutions that my lab has helped develop in combination with computer scientists at LBL and elsewhere.


We hope that this meeting can lead to further solutions in order to overcome this enormous bottle-neck that is emerging in structural cell biology imaging.


Computational Topology, Geometry and Analysis for Quantitative Relationships between Biological Form and Function from 3D Electron Microscopy

My two part talk shall focus on a set of computational topology, geometry and analysis techniques necessary to establish quantitative relationships between biological form and function, especially where structural models of molecules and cells are elucidated from 3D Electron Microscopy. First, we shall process cryo electron micrograph image collections of single particle and electron tomography, construct topologically curated and spatially realistic dimensional (3D) ultra-structure models of large bio-molecular complexes (LBCs) , towards deriving accurate biophysical properties of the multiple interfaces of LBCs. In the second part I shall again apply topology, geometry, and analysis techniques to establish meaningful relationships between neuronal form and function at the synaptic interaction level. Here our focus shall be to process serial section transmission electron micrograph image collections (ssTEM and STEM in-SEM), construct topologically curated models of local neuronal circuits, complete with the complex dendritic arbor of pyramidal neurons, to derive multi-scale electrophysical properties.

The making of the worm: every cell, every minute

The nematode C. elegans is a major model organism for biomedical research. Its size and transparency make in toto imaging a powerful approach to tackle its biology and achieve synergy with genetics and systems biology. We study C. elegans embryogenesis by combining these approaches. I will discuss our efforts on the following fronts: (1) optical microscopy for long-term live imaging through embryogenesis at single-cell resolution; (2) image analysis for automated cell lineage tracing; (3) genome-wide analysis of the phenotypic landscape and mechanistic models of development; and (4) construction of an interactive 4D atlas of neural development for the entire nervous system.

Geometric graph-based methods for segmentation of hyperspectral imagery

We present a new algorithm for segmentation of large datasets with graph based structure. The method combines ideas from classical PDE-based image segmentation with fast and accessible linear algebra methods for computing information about the spectrum of the graph Laplacian. I will present results for image processing applications such as image labeling and hyperspectral video segmentation.

Data-induced Challenges in Developing Chemical Imaging for Pathology

Chemical imaging is an emerging technology in which light is employed to record the molecular content of samples. Instead of using dyes or stains, the approach uses computation to visualize the molecular content. First, the technology is introduced and progress in instrumentation reviewed. The importance of simulation and of data analysis will next be emphasized and examples provided. Big data strategies and computational needs will be discussed next. Finally, a computational strategy and statistical considerations underlying decision-making are described. Our laboratory focuses, among other topics, on the analysis of biological materials for histopathology. In demonstrative applications, we describe attempts to diagnose and grade cancer in breast and prostate biopsies without human input. Results indicate that a rapid assessment of lesions is possible with high accuracy and their lethality may be predicted using a systems approach to pathology, which is critically enabled by rapid data analysis strategies. The volume and mining of this unique data set involve several challenges. These are described with potential solutions and impacts of novel strategies to address these challenges.

Mapping behavior to neural anatomy using machine vision and thermogenetics

To understand the relationship between neural anatomy and behavior, the ultimate output of the nervous system, we performed a high-throughput, thermogenetic screen of 2,200 transgenic lines of Drosophila from the Janelia GAL4 collection. Each GAL4 line drives expression in a different, sparse subset of neurons in the fly nervous system. Using UAS dTrpa1, we selectively activated these sparse subsets of neurons, and measured the behavioral effects. For this screen, we developed a complete, high-throughput, automated system for measuring the locomotion and social behavior of flies with both breadth and depth. We recorded 20,000 videos of groups of flies freely behaving in an open-field walking arena, totaling about 400 TB of raw data. From the video, we tracked the flies' body positions and wings using our tracking software, Ctrax. We used our machine learning-based behavior classification system, JAABA (The Janelia Automatic Animal Behavior Annotator), to create 15 behavior classifiers (e.g. walking, grooming, chasing) that input the trajectories created by Ctrax and output predictions, for each frame and each fly, of the flies' behaviors (totaling ~175 billion annotations of fly behavior). For each line of flies, we compute a set of ~800 behavior statistics, such as the fraction of time spent chasing, or the average speed while walking, summarizing the behavioral effects of activating the targeted neurons in a concise yet interpretable manner. Concurrent with our screen, the Janelia Fly Light project has imaged the expression pattern of each GAL4 line, producing image stacks indicating which neurons are likely being activated in each line. By mining our behavior data set in conjunction with the Fly Light imagery data, we have identified novel sets of neurons potentially involved in jumping, female chasing, and wing flicking behavior.

Imaging the Brain with Heterogeneous Data Sources

A new model is developed for joint analysis of ordered, categorical, real and count data, motivated by brain imaging and human behavior analysis. In the motivating application, the ordered and categorical data are answers to questionnaires, the (word) count data correspond to the text questions from the questionnaires, and the real data correspond to fMRI responses for each subject. We also combine the analysis of these data with single-nucleotide polymorphism (SNP) data from each individual. The questionnaires considered here correspond to standard psychological surveys, and the study is motivated by psychology and neuroscience. The proposed Bayesian model infers sparse graphical models (networks) jointly across people, questions, fMRI stimuli and brain activity, integrated within a new matrix factorization based on latent binary features. We demonstrate how the learned model may take fMRI and SNP data from a subject as inputs, and predict (impute) how the individual would answer a psychological questionnaire; going in the other direction, we also use an individual's SNP data and answers from questionnaires to impute unobserved fMRI data. Each of these two imputation settings have practical and theoretical applications for understanding human behavior and mental health, which are discussed.

Topological Representation of Large and Complex Data Sets

We will discuss some methods of compressed representations of complex data sets using topological networks. The representations frequently preserve features of the data sets, and are extremely useful in locating subpopulations which are difficult to detect using other methods.

The methods act as a useful complement to existing machine learning methods.

Representation theoretic patterns in three dimensional cryo-electron microscopy

Three dimensional cryo-electron microscopy (3D cryo-EM, for short) is the problem of determining the three dimensional structure of a large molecule from the set of images, taken by an electron microscope, of randomly oriented and positioned identical molecular particles which are frozen in a thin layer of ice. A solution to this problem is of particular interest, since it promises to be an entirely general technique which does not require crystallization or other special preparation stages. Present approaches to the problem fail with particles that are too small, cryo-EM images that are too noisy or at resolutions where the signal-to-noise ratio becomes too small.

The focus of my talk is the intrinsic reconstitution algorithm, due to Singer and Shkolnisky, which constitutes a basic step for the solution of the 3D cryo-EM problem and whose main appealing property is its remarkable numerical stability to noise. My goal is to give an introductory explanation of the mathematical principles underlying this novel algorithmic approach, while hinting about how they apply to other fundamental problems in cryo-EM and beyond. Along the way, I will describe the mathematical model underlying the experimental set-up, specifying the main computational problems/technical difficulties that should be resolved as part of three dimensional structure determination from cryo-EM images.

Finally, to put things in a broader mathematical perspective, I will briefly mention the general picture: explaining how the intrinsic reconstitution algorithm can be recasted in the (yet to be fully developed) framework of categorical optimization, which is a novel paradigm for solving certain types of non-linear optimization problems by characterizing the solution as an object of a category instead of as an element of a set.

This work is part of a collaborative project conducted with Amit Singer (Princeton), Shamgar Gurevich (Wisconsin Madison), Yoel Shkolnisky (Tel-Aviv University) and Fred Sigworth (Yale).

Image-Based Biomedical Modeling, Simulation and Visualization

Increasingly, biomedical researchers need to build functional computer models from images (MRI, CT, EM, etc.). The "pipeline" for building such computer models includes image analysis (segmentation, registration, filtering), geometric modeling (surface and volume mesh generation), large-scale simulation (parallel computing, GPUs), large-scale visualization and evaluation (uncertainty, error). In my presentation, I will present research challenges and software tools for image-based biomedical modeling, simulation and visualization and discuss their application for solving important research and clinical problems in neuroscience, cardiology, and genetics.

Reconstructing the nervous system Development and function with light sheet microscopy

In the embryonic development of vertebrates and higher invertebrates, a single cell is transformed into a fully functional organism comprising on the order of tens of thousands of cells or more. In a complex process of self-organization, these cells rapidly divide, migrate, differentiate and form tissues and organs able to perform the most challenging tasks. The nervous system is a key component of the developmental building plan that stands out in terms of size, complexity and function. However, very little is known about the developmental dynamics of this complex system, since the technology to comprehensively record and computationally analyze in vivo cell behavior in neural tissues has been lacking. The overall objective of our research is to gain such quantitative experimental access, to determine the fundamental rules governing neural development, and to systematically link development to the functional activation of circuits in the nervous system.



I will present our experimental approach based on light-sheet fluorescence microscopy, an emerging imaging technology that achieves exceptionally high imaging speed and signal-to-noise ratio, while minimizing light exposure of the specimen. This unique combination of capabilities makes light-sheet microscopes indispensable for the long-term in vivo imaging of entire developing organisms. We are designing advanced implementations of scanned light-sheet fluorescence microscopy, such as our SiMView technology framework for simultaneous multiview imaging [1], to systematically image the early development of entire fruit fly, zebrafish and mouse embryos with cellular resolution. I will furthermore present strategies for automated large-scale image processing of these multi-terabyte light-sheet microscopy data sets. This combined experimental and computational approach allows us to perform whole-organism functional imaging [2] and to quantitatively analyze developmental lineages and their interrelationships in the entire animal [3]. Our goal is to take advantage of these high-resolution data to attain a system-level understanding of cell fate decisions and the establishment of structural connectivity, and how the underlying mechanisms give rise to the dynamic architecture of neural tissues. In the long-term perspective, we will use this information to establish and validate a computer model of the developing nervous system [4].



I envision that our quantitative approach to the reconstruction of large neuronal system dynamics will provide critical insights into the properties of complex circuits and complement ongoing large-scale electron microscopy analyses of static neuronal network architecture.



[1] Tomer et al., Nature Methods, 9:755-63 (2012)


[2] Ahrens et al., Nature Methods, 10:413-20 (2013)


[3] Amat and Keller, Development, Growth and Differentiation, 55:563-78 (2013)


[4] Keller, Science, 340:1234168 (2013)


Solving NMR Distance Geometry Problem by Global Registration

The distance geometry problem in NMR structural calculation consists of estimating the coordinates of atoms from imprecise measurements of a subset of their pair-wise distances. This talk will focus on recent divide-and-conquer approaches that solve the problem in two steps: In the first step, the atoms are partitioned into smaller subsets and the structure for each subset is found separately, whereas in the second step a global molecular structure is obtained by stitching together all the local predetermined substructures. Results of numerical simulations demonstrate the advantages of this approach in terms of accuracy and running time. This is a joint work with Kunal Chaudhury and Amit Singer.

Geometric Multiscale methods and models in high dimensions

We will discuss the problem of dictionary learning for images and a novel approach to the construction of multi-resolution dictionaries for data and images, that has several advantages over the current state of art, including fast algorithms for the construction of the dictionary, the calculation of the coefficients onto the dictionary, and guarantees on the performance of such dictionary learning algorithm. We discuss applications to the analysis of large data sets, including collections of images, and representations/visualizations thereof.

Extracting large networks from connectomes

Mapping neural networks in brain, retina and spinal cord requires (1) comprehensive parts lists (vertex types), (2) nanometer scale connection detection (edge types), and (3) millimeter scale network tracing. Together this requires high resolution transmission electron microscope (TEM) imaging on a scale not routinely possible. Combining serial sectioning and TEM hardware control (SerialEM, Mastronarde, 2005, PMID 16182563) it is possible to created automated TEM (ATEM) imaging of mammalian retina composed of ≈0.4-1.4M high resolution images and assemble them into coherent 3D volumes of 16-21 TB of raw image data (Anderson et al., 2009, PMID 19855814). How should we build even larger connectomes? At present, we estimate that 100—1000 TEM systems are underutilized globally, representing the most cost-effective scale-up path.



The next ask is navigating, exploring, segmenting and annotating the data space for characterization of network motifs. The key tool for this is the Viking system developed by Anderson et al. (2010 PMID 21118201, 2011 PMID 21311605). Viking allows web-compliant delivery of imagery and collection of markup. Our experience suggests that complex network analysis (rather than simple shape-building) is more effectively done by teams of trained annotators working under executive analysts rather than crowd-sourcing. At present, retinal connectome RC1 contains >825,000 discrete annotations tracking ≈ 680 neural structures with 8300 connections, and ≈ 300 glia. We estimate this is only ≈20% of the expected annotation set. However, efficiency accelerates 2-3 fold as the density of identified process collision increases. Further many connection sets are replicates



Lacking robust tools for automated tracking, the best strategy has been to use multichannel molecular / activity markers and cell classification to segment and prioritize tracking targets, followed by intensive manual annotation (Lauritzen et al., 2012 PMID 23042441). A scientifically credible yet computationally manageable data volume in the vertebrate nervous system is currently in the 10-50 TB range. But a persistent challenge has been the vast scale differences for different neurons in a volume. There are an estimated 60+ classes of cells in the mammalian retina. For example retinal connectome RC1 is a disk of neural retina 243 μm wide and 30 μm tall. It contains 104 copies of the single class of rod bipolar cells with axonal fields spanning 15-25 μm; 39 copies of one of its target cells the AII amacrine cell, whose dendrites span 50-60 μm; and a dozen copies of a larger target, the AI amacrine cell with dendrites spanning up to 1000 μm. But one of the main targets of the is the ganglion cell superclass, which can be segmented into 15 classes. RC1 contains a few whole copies of a few of these, and fragments of many more, but a volume containing a complete set would need to be > 1 mm in diameter and require over 250 TB. While this is computationally feasible, the acquisition time for one microscope is not practical, nor is routine small-team annotation. Clearly, scaling by (1) platform and (2) automated annotation is critical.



Finally, some key networks in the retina are massive structured hubs. The AII amacrine cells has partnerships with at least 30 cell classes, meaning that all retinal neurons are no more than 2 hops from an AII cell and most no more than 2 hops from any other cell. This means that network graphs for even small volumes are extremely dense and differential visualization is a major goal.


Classification and visualization of neural patterns using subspace analysis statistical methods

With new developments in experimental recording technologies, neuroscientists are capable of record large and complex neural data and also shift their study from single-unit level into multi-units level. As a result, methods for handling big data are needed in order to investigate temporal and spatial patterns and also to enable the scientists to have an intuitive understanding of the complex data. In this study, we used eigenvalue/eigenvector methods, such as Principal Component Analysis (PCA) and Multiple Discriminant Analysis (MDA) for representing the large-scale data in lower dimension and for pattern classification. We apply these methods to data from large-scale multi-electrode recordings in the hippocampus and from optical imaging data from the olfactory receptor neurons (ORNs). Altogether, application of these subspace analysis techniques to data containing experimental recordings from neurons of different animals and brain regions shows their usefulness and potential for uncovering neural codes hidden in complex and dynamics population patterns.

High-throughput Bioimage Informatics for Neuroscience and Cell Biology: Worm, Fly, Mouse, and Human

In recent years, high-throughput phenotype screening that involves systematic analysis of microscopic images (thus called "Bioimage Informatics") and other types of data has become more and more prevailing and promising. Here I will discuss several examples of widely used model systems, including C. elegans, fruitfly, mouse, and human. I will also discuss our high-performance image visualization and computing platform, Vaa3D (http://vaa3d.org), which has been used in several challenging high-throughput bioimage informatics applications, and my recent work on fast 3D microscopic smart-imaging system for neuroscience studies.

Task-Based Information Content of Medical Images

Medical images are typically obtained with a clinical task in mind. These tasks can often be modeled as signal detection or parameter estimation. In this talk, we will define the information content of medical images based on the performance of mathematical models for clinical tasks using those images. To quantify this type of information content we need to define the task (the intended use of the images), the statistics (the sources of variability in the data), and the observer (how we intend to obtain the information from the images). This task-based optimization can be used in a wide variety of settings. This talk will present results regarding how higher resolution projections can be used to improve the noise properties of CT reconstructions. This measure of information content could also be used to evaluate and optimize methods to reduce the size of large data sets.

Simple signed-distance approach to the measurement of depth in human cerebral cortex

High-resolution functional MRI methods have the potential to resolve depth variations in laminated brain tissues such as human cerebral cortex. However, it is challenging to create a geometrically logical definition of depth in this highly convoluted topology. Within the gray matter, which is bounded by the gray-white and pial surfaces, a nearest-neighbor Euclidean distance definition from the two surfaces is not satisfactory because of inconsistency between the distances defined from the two surfaces, and a failure to properly follow the dissimilar topologies of the surfaces. A method based on solution of Laplace's equation between the two surfaces has been used, but the usual finite-difference solution methods can suffer from artifacts when the surfaces are discretized onto a grid, particularly in the depths of narrow sulci. Here we propose an alternative approach based on an interpolated signed-distance function that makes direct use of the smooth surface representations. We compute a signed distance function, S(x,y,z), defined in two simple ways that are based on Euclidean distance metrics, with the sign determined by the brain volume tissue segmentation. Signed distance is calculated separately for the gray-white and pial surfaces, Sw and Sp, respectively. We then form a weighted-distance function, D(w? x,y,z) = wSw + (1− w)Sp. Variation of the weighting parameter, w ∈ [0,1], defines a smooth transition between the two surfaces. In our application, we solve D(w? x,y,z) = 0 to obtain w at every grid point within the gray matter domain. We treat w is a pseudo-potential that smoothly and logically interpolates between the two surfaces, thus defining a normalized depth coordinate within the gray matter. To obtain physical distances and gray-matter thickness, we calculate ∇w and trace it from every point on the gray-white surface to the pial surface. The method was applied to four MRI brain anatomies, and yielded visually and quantitatively reasonable measurements of gray matter depth and thickness.

The Open Microscopy Environment: Open Source Image Informatics for the Biological Sciences

Despite significant advances in cell and tissue imaging instrumentation and analysis algorithms, major informatics challenges remain unsolved: file formats are proprietary, facilities to store, analyze and query numerical data or analysis results are not routinely available, integration of new algorithms into proprietary packages is difficult at best, and standards for sharing image data and results are lacking. We have developed an open-source software framework to address these limitations called the Open Microscopy Environment (http://openmicroscopy.org). OME has three components—an open data model for biological imaging, standardised file formats and software libraries for data file conversion and software tools for image data management and analysis.

http://openmicroscopy.org/site/support/ome-model/) provides a common specification for scientific image data and has recently been updated to more fully support fluorescence filter sets, the requirement for unique identifiers, screening experiments using multi-well plates.

The OME-TIFF file format (http://openmicroscopy.org/site/support/ome-model/ome-tiff) and the Bio-Formats file format library (http://openmicroscopy.org/site/products/bio-formats) provide an easy-to-use set of tools for converting data from proprietary file formats. These resources enable access to data by different processing and visualization applications, sharing of data between scientific collaborators and interoperability in third party tools like Fiji/ImageJ.

The Java-based OMERO platform (http://openmicroscopy.org/site/products/omero) includes server and client applications that combine an image metadata database, a binary image data repository and visualization and analysis by remote access. The current stable release of OMERO (OMERO-4.4; http://openmicroscopy.org/site/support/omero4/downloads) includes a single mechanism for accessing image data of all types-- regardless of original file format-- via Java, C/C++ and Python and a variety of applications and environments (e.g., ImageJ, Matlab and CellProfiler). This version of OMERO includes a number of new functions, including SSL-based secure access, distributed compute facility, filesystem access for OMERO clients, and a scripting facility for image processing. An open script repository allows users to share scripts with one another. A permissions system controls access to data within OMERO and enables sharing of data with users in a specific group or even publishing of image data to the worldwide community. Several applications that use OMERO are now released by the OME Consortium, including a FLIM analysis module, an object tracking module, two image-based search applications, an automatic image tagging application, and the first release of a biobanking application (http://www.openmicroscopy.org/site/products/partner). Our next version, OMERO-5 (http://openmicroscopy.org/site/products/ome5; currently available as a Release Candidate) includes updates and resources to specifically support large datasets that appear in digital pathology and high content screening. Importing these large datasets is fast, and data are stored in their original file format, so they can be accessed by 3rd party software.

OMERO and Bio-Formats run the JCB DataViewer (http://jcb-dataviewer.rupress.org/), the world’s first on-line scientific image publishing system and are used to publish 3D EM tomograms in the EMDataBank (http://emdatabank.org/). They also power several large institutional image data repositories (e.g., http://odr.stowers.org and http://lincs.hms.harvard.edu/).

Brain microstructure from next generation diffusion MRI

We are at the cusp of a completely new generation of diffusion MRI (dMRI) technologies. New methods are transforming what can be measured, and have the potential to vastly improve tissue characterization using diffusion MRI. In diffusion MRI, each millimeter-size voxel of the image contains information on the micrometer-scale translational displacements of the water. The vast majority of applications today focus on the simplest form of the original MRI diffusion experiment, the Stejskal-Tanner pulse sequence. This sequence is based on a pair of short pulsed diffusion encoding gradients, which we will refer to as the single pulsed field gradient (sPFG) experiment. sPFG is used in diffusion tensor imaging (DTI), enabling popular measures such as the mean diffusion (apparent diffusion coefficient, ADC) and diffusion anisotropy (Fractional Anisotropy, FA). sPFG is also the technology underlying higher order diffusion imaging (HARDI), and diffusion spectrum imaging (DSI). Although current popular diffusion measures are very sensitive to changes in the cellular architecture, they are not very specific regarding the type of change. In contrast, the new generation of dMRI technologies such as oscillating gradients, double/multi pulsed-field gradient sequences, and more general waveform sequences, has the potential to provide significant new sensitivity for imaging of the brain’s cellular microstructure. I will present recent work on general waveform dMRI.

Adaptive Mesh Representation and Processing of Biomedical Images

Biomedical imaging technologies are now widely used in many aspects of science and engineering. Images acquired or computed are commonly digitized into two- or three-dimensional (2D or 3D) regular arrays composed of pixels or voxels respectively. Despite its ease of use and processing on computers, this type of image representation often contains a lot of information redundancy, which poses great challenges to data storage and transmission especially with rapidly increasing resolutions of digital images and growing availabilities of volumetric images. In addition, many pixel-based image processing and analysis algorithms require at least linear time complexity with respect to the number of pixels in an image. Increasing image sizes can become a bottleneck in many real-time applications such as patient diagnosis and remote healthcare systems. For these reasons, finding other image representations that allow less storage space and faster processing speed would be important.

In this talk, I shall present an approach to representing images with irregular meshes. Compared to the traditional pixel-based method, the new representation provides a significantly more compact way to describe an image, which is well suited to storing and transferring large imaging data. The mesh structure of an image is adaptively defined with finer elements near image features. A method will also be presented to restore the pixel-based image with an arbitrary resolution from the mesh representation.

Two-Stage Method for Salt-and-Pepper Noise Removal Using Statistical Jump Regression Analysis

Two-Stage Method for Salt-and-Pepper Noise Removal Using Statistical Jump Regression Analysis

video image

Imaging the Brain with Heterogeneous Data Sources
Larry Carin

A new model is developed for joint analysis of ordered, categorical, real and count data, motivated by brain imaging and human behavior analysis. In the motivating application, the ordered and categorical data are answers to questionnaires,

video image

Extracting large networks from connectomes
Robert Marc

Mapping neural networks in brain, retina and spinal cord requires (1) comprehensive parts lists (vertex types), (2) nanometer scale connection detection (edge types), and (3) millimeter scale network tracing. Together this requires high reso

video image

2D and 3D Imaging of Entire Cells and Tissues at Macromolecular Resolution by Advanced Electron Microscopic Approaches
Manfred Auer

All processes defining and sustaining life take place within cells or between cells, often mediated by multiprotein supramolecular complexes also known as macromolecular machines . While Structural Genomics and recently Cryo-EM have yielded