This workshop focuses on the challenges presented by the analysis and visualization of large data sets that are collected in biomedical imaging, genomics and proteomics. The sheer size of data (easily in the range of terabytes, and growing) requires computationally efficient techniques for the sampling, representation, organization, and filtering of data; ideas and techniques from signal processing, geometric and topological analysis, stochastic dynamical systems, machine learning and statistical modeling are needed to extract patterns and characterize features of interest. Visualization enables interaction with data, algorithms, and outputs. Data sets from biomedical imaging, genomics and proteomics often have unique characteristics that differentiate them from other data sets, such as extremely high-dimensionality, high heterogeneity due to different data modalities (across different spatial and temporal scales, but also across different biological layers) that need to be fused, large stochastic components and noise, low sample size and possibly low reproducibility of per-patient data. These unique aspects, as well as the large size, pose challenges to many existing techniques aimed at solving the problems above. The workshop will bring together biologists, computer scientists, engineers, mathematicians and statisticians working in a wide of areas of expertise, with the goal of pushing existing techniques, and developing novel ones, for tackling the unique challenges offered by large data sets in biomedical imaging.
|Monday, April 21, 2014|
|Tuesday, April 22, 2014|
|Wednesday, April 23, 2014|
|Thursday, April 24, 2014|
|Friday, April 25, 2014|
|ADENIYI, Arunafirstname.lastname@example.org||Department of Mathematical and Physical Sciences, Osun State University, Osogbo, Nigeria|
|Auer, Manfred||MAuer@lbl.gov||Bioenergy/GTL & Structural Biology, Lawrence Berkeley National Laboratory|
|Bao, Zhirongemail@example.com||Developmental Biology Program, Sloan-Kettering Institute|
|Bertozzi, Andreafirstname.lastname@example.org||Mathematics, UCLA|
|Bhargava, Rohitemail@example.com||Bioengineering, University of Illinois at Urbana-Champaign|
|Branson, Kristinfirstname.lastname@example.org||Ethomics, HHMI Janelia Farm Research Campus|
|Cardona, Albertemail@example.com||Janelia Farm, Howard Hughes Medical Institute|
|Carin, Larryfirstname.lastname@example.org||Dept. of Electrical and Computer Engineering, Duke University|
|Carlsson, Gunnar||Gunnar@math.stanford.edu||Department of Mathematics, Stanford University|
|Driscoll, Tobinemail@example.com||Mathematical Sciences, University of Delaware|
|Elizondo, Priscillafirstname.lastname@example.org||Department of Mathematics, University of Utah|
|Hadani, Ronnyemail@example.com||Mathematics, University of Texas at Austin|
|Johnson, Chrisfirstname.lastname@example.org||Scientific Computing and Imaging Institute, University of Utah|
|Joshi, Sarangemail@example.com||Bioengineering, University of Utah|
|Keller, Philippfirstname.lastname@example.org||Janelia Farm Research Campus, Howard Hughes Medical Institute|
|Maggioni, Mauroemail@example.com||Mathematics, Duke University|
|Marc, Robertfirstname.lastname@example.org||Ophthalmology, University of Utah School of Medicine|
|Metaxas, Dimitrisemail@example.com||Computer Science, Computational Biomedicine, Imaging, and Modeling Center|
|Meyer, Francoisfirstname.lastname@example.org||Electrical and Computer Engineering, University of Colorado at Boulder|
|Mubayi, Anujemail@example.com||Mathematics, Northeastern Illinois University|
|Pascucci, Valeriofirstname.lastname@example.org||Center for extreme data management analysis and visualization, University of utah|
|Peng, Hanchuanemail@example.com||Computational Neuroanatomy, Allen Institute for Brain Science|
|Pineda, Angelfirstname.lastname@example.org||Mathematics, California State University, Fullerton|
|Ravikumar, Pradeepemail@example.com||Computer Science, University of Texas at Austin|
|Roysam, Badrifirstname.lastname@example.org||Electrical & Computer Engineering, University of Houston|
|Swedlow, Jasonemail@example.com||Centre for Gene regulation and Expression, University of Dundee|
|Tannenbaum, Allenfirstname.lastname@example.org||Dept. of Biomedical Engineering, Georgia Institute of Technology|
|Westin, Carl-Fredrikemail@example.com||Harvard Medical School,|
|Willett, Rebeccafirstname.lastname@example.org||Electrical and Computer Engineering, University of Wisconsin - Madison|
|You, Yunchengemail@example.com||Mathematics and Statistics, University of South Florida|
|Yu, Zeyunfirstname.lastname@example.org||Computer Science, University of Wisconsin-Milwaukee|
|Zhang, Jian-Zhouemail@example.com||College of Computer, Sichuan University|
|Zhao, Jiafirstname.lastname@example.org||Department of Mathematics, University of South Carolina|
All processes defining and sustaining life take place within cells or between cells, often mediated by multiprotein supramolecular complexes also known as macromolecular machines . While Structural Genomics and recently Cryo-EM have yielded ever-increasing insight into the overall shape of such macromolecular machines and the detailed mechanism of e.g. catalysis, we lack insight how these machines are organized and function within cells. We must determine their 3D organization, subcellular location (e.g. with respect to ultrastructural landmarks) and their interaction with other proteins, the cytoskeleton and organelles and any changes of these characteristics during embryonic development, as part of their physiological function or during pathogenesis.
Using various biological examples, including inner ear hair cells and related tissues central to hearing, mammary gland development and breast cancer, as well microbial communities, I will illustrate the power of modern 2D and 3D electron microscopy imaging, including widefield montaging TEM, TEM tomography as well as Focused Ion Beam Scanning Electron Microscopy (FIB/SEM) and Serial Block Face (SBF/) SEM. The latter two techniques can yield macromolecular insight into the 3D organization of entire cells and tissues, and thus have the potential to truly revolutionize cell biology.
However, while it is now possible to obtain 10k by 10k by 10k voxel data sets (soon 32kx32kx32k), we do not possess the computational capabilities to deal with such terabytes of data, in terms of visualization, feature extraction/segmentation, annotation, and quantitative analysis. I will discuss the challenges these novel imaging approaches pose, describe how we currently deal with such data sets and discuss some emerging solutions that my lab has helped develop in combination with computer scientists at LBL and elsewhere.
We hope that this meeting can lead to further solutions in order to overcome this enormous bottle-neck that is emerging in structural cell biology imaging.
The nematode C. elegans is a major model organism for biomedical research. Its size and transparency make in toto imaging a powerful approach to tackle its biology and achieve synergy with genetics and systems biology. We study C. elegans embryogenesis by combining these approaches. I will discuss our efforts on the following fronts: (1) optical microscopy for long-term live imaging through embryogenesis at single-cell resolution; (2) image analysis for automated cell lineage tracing; (3) genome-wide analysis of the phenotypic landscape and mechanistic models of development; and (4) construction of an interactive 4D atlas of neural development for the entire nervous system.
We present a new algorithm for segmentation of large datasets with graph based structure. The method combines ideas from classical PDE-based image segmentation with fast and accessible linear algebra methods for computing information about the spectrum of the graph Laplacian. I will present results for image processing applications such as image labeling and hyperspectral video segmentation.
Chemical imaging is an emerging technology in which light is employed to record the molecular content of samples. Instead of using dyes or stains, the approach uses computation to visualize the molecular content. First, the technology is introduced and progress in instrumentation reviewed. The importance of simulation and of data analysis will next be emphasized and examples provided. Big data strategies and computational needs will be discussed next. Finally, a computational strategy and statistical considerations underlying decision-making are described. Our laboratory focuses, among other topics, on the analysis of biological materials for histopathology. In demonstrative applications, we describe attempts to diagnose and grade cancer in breast and prostate biopsies without human input. Results indicate that a rapid assessment of lesions is possible with high accuracy and their lethality may be predicted using a systems approach to pathology, which is critically enabled by rapid data analysis strategies. The volume and mining of this unique data set involve several challenges. These are described with potential solutions and impacts of novel strategies to address these challenges.
A new model is developed for joint analysis of ordered, categorical, real and count data, motivated by brain imaging and human behavior analysis. In the motivating application, the ordered and categorical data are answers to questionnaires, the (word) count data correspond to the text questions from the questionnaires, and the real data correspond to fMRI responses for each subject. We also combine the analysis of these data with single-nucleotide polymorphism (SNP) data from each individual. The questionnaires considered here correspond to standard psychological surveys, and the study is motivated by psychology and neuroscience. The proposed Bayesian model infers sparse graphical models (networks) jointly across people, questions, fMRI stimuli and brain activity, integrated within a new matrix factorization based on latent binary features. We demonstrate how the learned model may take fMRI and SNP data from a subject as inputs, and predict (impute) how the individual would answer a psychological questionnaire; going in the other direction, we also use an individual's SNP data and answers from questionnaires to impute unobserved fMRI data. Each of these two imputation settings have practical and theoretical applications for understanding human behavior and mental health, which are discussed.
Three dimensional cryo-electron microscopy (3D cryo-EM, for short) is the problem of determining the three dimensional structure of a large molecule from the set of images, taken by an electron microscope, of randomly oriented and positioned identical molecular particles which are frozen in a thin layer of ice. A solution to this problem is of particular interest, since it promises to be an entirely general technique which does not require crystallization or other special preparation stages. Present approaches to the problem fail with particles that are too small, cryo-EM images that are too noisy or at resolutions where the signal-to-noise ratio becomes too small.
The focus of my talk is the intrinsic reconstitution algorithm, due to Singer and Shkolnisky, which constitutes a basic step for the solution of the 3D cryo-EM problem and whose main appealing property is its remarkable numerical stability to noise. My goal is to give an introductory explanation of the mathematical principles underlying this novel algorithmic approach, while hinting about how they apply to other fundamental problems in cryo-EM and beyond. Along the way, I will describe the mathematical model underlying the experimental set-up, specifying the main computational problems/technical difficulties that should be resolved as part of three dimensional structure determination from cryo-EM images.
Finally, to put things in a broader mathematical perspective, I will briefly mention the general picture: explaining how the intrinsic reconstitution algorithm can be recasted in the (yet to be fully developed) framework of categorical optimization, which is a novel paradigm for solving certain types of non-linear optimization problems by characterizing the solution as an object of a category instead of as an element of a set.
This work is part of a collaborative project conducted with Amit Singer (Princeton), Shamgar Gurevich (Wisconsin Madison), Yoel Shkolnisky (Tel-Aviv University) and Fred Sigworth (Yale).
Increasingly, biomedical researchers need to build functional computer models from images (MRI, CT, EM, etc.). The "pipeline" for building such computer models includes image analysis (segmentation, registration, filtering), geometric modeling (surface and volume mesh generation), large-scale simulation (parallel computing, GPUs), large-scale visualization and evaluation (uncertainty, error). In my presentation, I will present research challenges and software tools for image-based biomedical modeling, simulation and visualization and discuss their application for solving important research and clinical problems in neuroscience, cardiology, and genetics.
In the embryonic development of vertebrates and higher invertebrates, a single cell is transformed into a fully functional organism comprising on the order of tens of thousands of cells or more. In a complex process of self-organization, these cells rapidly divide, migrate, differentiate and form tissues and organs able to perform the most challenging tasks. The nervous system is a key component of the developmental building plan that stands out in terms of size, complexity and function. However, very little is known about the developmental dynamics of this complex system, since the technology to comprehensively record and computationally analyze in vivo cell behavior in neural tissues has been lacking. The overall objective of our research is to gain such quantitative experimental access, to determine the fundamental rules governing neural development, and to systematically link development to the functional activation of circuits in the nervous system.
I will present our experimental approach based on light-sheet fluorescence microscopy, an emerging imaging technology that achieves exceptionally high imaging speed and signal-to-noise ratio, while minimizing light exposure of the specimen. This unique combination of capabilities makes light-sheet microscopes indispensable for the long-term in vivo imaging of entire developing organisms. We are designing advanced implementations of scanned light-sheet fluorescence microscopy, such as our SiMView technology framework for simultaneous multiview imaging , to systematically image the early development of entire fruit fly, zebrafish and mouse embryos with cellular resolution. I will furthermore present strategies for automated large-scale image processing of these multi-terabyte light-sheet microscopy data sets. This combined experimental and computational approach allows us to perform whole-organism functional imaging  and to quantitatively analyze developmental lineages and their interrelationships in the entire animal . Our goal is to take advantage of these high-resolution data to attain a system-level understanding of cell fate decisions and the establishment of structural connectivity, and how the underlying mechanisms give rise to the dynamic architecture of neural tissues. In the long-term perspective, we will use this information to establish and validate a computer model of the developing nervous system .
I envision that our quantitative approach to the reconstruction of large neuronal system dynamics will provide critical insights into the properties of complex circuits and complement ongoing large-scale electron microscopy analyses of static neuronal network architecture.
 Tomer et al., Nature Methods, 9:755-63 (2012)
 Ahrens et al., Nature Methods, 10:413-20 (2013)
 Amat and Keller, Development, Growth and Differentiation, 55:563-78 (2013)
 Keller, Science, 340:1234168 (2013)
We will discuss the problem of dictionary learning for images and a novel approach to the construction of multi-resolution dictionaries for data and images, that has several advantages over the current state of art, including fast algorithms for the construction of the dictionary, the calculation of the coefficients onto the dictionary, and guarantees on the performance of such dictionary learning algorithm. We discuss applications to the analysis of large data sets, including collections of images, and representations/visualizations thereof.
In recent years, high-throughput phenotype screening that involves systematic analysis of microscopic images (thus called "Bioimage Informatics") and other types of data has become more and more prevailing and promising. Here I will discuss several examples of widely used model systems, including C. elegans, fruitfly, mouse, and human. I will also discuss our high-performance image visualization and computing platform, Vaa3D (http://vaa3d.org), which has been used in several challenging high-throughput bioimage informatics applications, and my recent work on fast 3D microscopic smart-imaging system for neuroscience studies.
We consider the problem of recovering a 2D image, given a highly noisy version of the image. We consider both the denoising setting, where the noise is added directly to the image, and the tomographic setting, where we observe noisy projections of the image after X-ray/Radon transforms. While it is typically infeasible to obtain reliable reconstructions of the underlying image under extremely noisy settings, in modern statistical modeling, it is now increasingly apparent that reliable estimation is possible even with very limited data provided suitable structural constraints are imposed upon the model space. Accordingly, we consider imposing shape-based constraints on the underlying image. Building on recent work on diffeomorphism based characterizations of the notion of the "shape" of an object, we propose novel robust variants of shape-encouraging regularization functions, and demonstrate their applicability on a varied set of simulated data.
Joint with Chandra Bajaj, Onur Domanic, Ozan Oktem.
Despite significant advances in cell and tissue imaging instrumentation and analysis algorithms, major informatics challenges remain unsolved: file formats are proprietary, facilities to store, analyze and query numerical data or analysis results are not routinely available, integration of new algorithms into proprietary packages is difficult at best, and standards for sharing image data and results are lacking. We have developed an open-source software framework to address these limitations called the Open Microscopy Environment (http://openmicroscopy.org). OME has three components—an open data model for biological imaging, standardised file formats and software libraries for data file conversion and software tools for image data management and analysis.
http://openmicroscopy.org/site/support/ome-model/) provides a common specification for scientific image data and has recently been updated to more fully support fluorescence filter sets, the requirement for unique identifiers, screening experiments using multi-well plates.
The OME-TIFF file format (http://openmicroscopy.org/site/support/ome-model/ome-tiff) and the Bio-Formats file format library (http://openmicroscopy.org/site/products/bio-formats) provide an easy-to-use set of tools for converting data from proprietary file formats. These resources enable access to data by different processing and visualization applications, sharing of data between scientific collaborators and interoperability in third party tools like Fiji/ImageJ.
The Java-based OMERO platform (http://openmicroscopy.org/site/products/omero) includes server and client applications that combine an image metadata database, a binary image data repository and visualization and analysis by remote access. The current stable release of OMERO (OMERO-4.4; http://openmicroscopy.org/site/support/omero4/downloads) includes a single mechanism for accessing image data of all types-- regardless of original file format-- via Java, C/C++ and Python and a variety of applications and environments (e.g., ImageJ, Matlab and CellProfiler). This version of OMERO includes a number of new functions, including SSL-based secure access, distributed compute facility, filesystem access for OMERO clients, and a scripting facility for image processing. An open script repository allows users to share scripts with one another. A permissions system controls access to data within OMERO and enables sharing of data with users in a specific group or even publishing of image data to the worldwide community. Several applications that use OMERO are now released by the OME Consortium, including a FLIM analysis module, an object tracking module, two image-based search applications, an automatic image tagging application, and the first release of a biobanking application (http://www.openmicroscopy.org/site/products/partner). Our next version, OMERO-5 (http://openmicroscopy.org/site/products/ome5; currently available as a Release Candidate) includes updates and resources to specifically support large datasets that appear in digital pathology and high content screening. Importing these large datasets is fast, and data are stored in their original file format, so they can be accessed by 3rd party software.
OMERO and Bio-Formats run the JCB DataViewer (http://jcb-dataviewer.rupress.org/), the world’s first on-line scientific image publishing system and are used to publish 3D EM tomograms in the EMDataBank (http://emdatabank.org/). They also power several large institutional image data repositories (e.g., http://odr.stowers.org and http://lincs.hms.harvard.edu/).
Abstract coming soon.
Biomedical imaging technologies are now widely used in many aspects of science and engineering. Images acquired or computed are commonly digitized into two- or three-dimensional (2D or 3D) regular arrays composed of pixels or voxels respectively. Despite its ease of use and processing on computers, this type of image representation often contains a lot of information redundancy, which poses great challenges to data storage and transmission especially with rapidly increasing resolutions of digital images and growing availabilities of volumetric images. In addition, many pixel-based image processing and analysis algorithms require at least linear time complexity with respect to the number of pixels in an image. Increasing image sizes can become a bottleneck in many real-time applications such as patient diagnosis and remote healthcare systems. For these reasons, finding other image representations that allow less storage space and faster processing speed would be important.
In this talk, I shall present an approach to representing images with irregular meshes. Compared to the traditional pixel-based method, the new representation provides a significantly more compact way to describe an image, which is well suited to storing and transferring large imaging data. The mesh structure of an image is adaptively defined with finer elements near image features. A method will also be presented to restore the pixel-based image with an arbitrary resolution from the mesh representation.