MBI Logo
MBI Logo

Mini-workshop: Mathematical Challenges Arising in Cancer Models: Abstracts and Lecture Materials

Mathematical Models for the DNA Sequencing by Hybridization
Jacek Blazewicz, Institute of Computing Science, Poznan University of Technology & Institute of Bioorganic Chemistry, Polish Academy of Sciences

The talk deals with the problem of DNA de novo sequencing by hybrydization. Firstly, the problem with isometric oligonucleotide libraries is considered. A computational phase of this approach, i.e. a construction of a DNA sequence from oligonucleotides, may be modeled as a path construction problem in the newly introduced DNA graphs. In an error-free case this problem is solvable in polynomial time. On the other hand, it turns to be NP-hard in the strong sense in case of errors. Thus, since the last problem does not admit a polynominal time solution, a need arises to construct efficient heuristics solving the problem. In the talk, such a heuristic algorithm based on tabu search is discussed. Computational tests have proved its low complexity and high accuracy for both types of errors: false negatives and false positives. The case of isothermic oligonucleotide libraries is also discussed.

A Model of Tumour Cords Treatment by Intracellularly Sequestered Drugs
Antonio Fasano, University of Florence; Alberto Gandolfi, Istituto di Analisi dei Sistemi ed Informatica - CNR; and Alessandro Bertuzzi, Istituto di Analisi dei Sistemi ed Informatica - CNR

In previous papers [1,2] we described a model for the evolution of a tumour cord (an axisymmetric arrangement of tumour cells growing around a blood vessel, generally surrounded by necrosis) under the action of cell killing agents. The model takes into account cell motion and includes oxygen diffusion and consumption. Several drastic simplifications were introduced with the main scope of avoiding the study of the motion of the extracellular fluid. The most restrictive assumption was that the volume fraction occupied by the cells (alive or dead) always keeps the same constant value everywhere in the cord. Nevertheless, the model emphasizes an important feature, overlooked in the previous literature: the interface between the region containing living cells and the purely necrotic zone may or may not be a material surface (i.e. moving with the cell velocity). The nature of the interface, and consequently the selection of the free boundary conditions, is decided by the action of constraints on the interface velocity and on the oxygen concentration on it.

In this presentation we want to illustrate a new model that partially removes the restrictive assumption on the volume fraction occupied by the cells. We still keep a constant value of it in the complement of the necrotic region, while the necrotic region is treated as a "fluid reservoir", exchanging matter with the rest of the cord through the interface and with the exterior through the lateral sides. In this way we were able to bring into the model the most relevant aspects of the dynamics of the extracellular fluid, computing the longitudinal average of the radial velocity and of the pressure field.

To improve the modelling of the therapeutic treatment, we have also included: (i) a more detailed description of the action of cell killing drugs, in which the extracellular and the intracellular drug concentrations are distinguished and the intracellularly bound drug (sequestered drug) is responsible of cytotoxicity; (ii) the formation of apoptotic bodies after the death of a cell, with some volume transfer to the liquid component, and the possible phagocytosis of apoptotic bodies by living cells.

Besides the existence and uniqueness proof for both the steady state and the evolution model, some numerical results will also be presented.

  1. Bertuzzi, A., Fasano, A., & Gandolfi, A. Submitted to SIAM J. Math. Anal.
  2. Bertuzzi, A., d'Onofrio, A., Fasano, A., & Gandolfi, A. (2003). Bull. Math. Biol. 65, 902-931.

Mathematical Challenges Arising from Simple Cancer Models
Avner Friedman, Mathematical Biosciences Institute, The Ohio State University

Simple cancer models can be formulated in terms of a system of PDEs for densities of cell and concentrations of nutrients and drugs. The tumor region is continuously changing due to the proliferation of cells and the death of cells. The situation leads to challenging free boundary problems. In this talk I shall focus on two models. The first model involves three population of cells: proliferation, quiescent and necrotic, as well as nutrient concentration which affects the rates of changes among the different types of cells. The second model involves cancer cells which are uninfected by the drug, cancer cells which are infected, and necrotic cells. The drug consists of genetically engineered virus particles which adsorb to cancer cells, cause their death, and then burst out to infect adjacent cancer cells. We shall state some recent mathematical results, and describe open problems.

Stochastic Models of Repeat DNA Motifs and Amplified Genes
Marek Kimmel, Department of Statistics, Rice University

Repeat structures in DNA, such as telomere endings or microsatellite repeats, play a role in carcinogenesis. Amplified genes are implicated in evolution of drug resistance. The talk reviews the use of branching process theory and Markov chain theory in analysis of stochastic models of these two types of elements. Among other, we explore the connections between quasistationary distributions and selection. We also analyze branching processes with infinite number of types and branching-within-branching processes as models for gene amplification. Finally, we study reducible multitype branching processes as models for telomere shortening.

Avascular Tumor Formation: A Mathematical Model for the Onset of Avascular Tumor Growth in Response to the Loss of p53 Function
Howard Levine, Iowa State University; Michael W. Smiley, Anna Tucker, and Marit Nilsen-Hamilton

We propose a mathematical model for the formation of an avascular tumor based on the loss of tumor suppressor function that ensues under p53 gene mutation. The p53 protein regulates apoptosis, cell expression of growth factor and matrix metalloproteinase, regulatory functions which many mutant p53 proteins do not possess. The focus is on a description of cell movement as the transport of probability density rather than as the movement of individual cells. In contrast to earlier works on solid tumor growth, in this paper a model is proposed for the initiation of tumor growth. The central idea, taken from the mathematical theory of dynamical systems, is to view the loss of p53 function in a few cells as as a small instability in a rest state for an appropriate system of differential equations describing cell movement. This instability is shown (numerically) to lead to a second, spatially inhomogeneous, solution which can be thought of as solid tumor whose growth is nutrient diffusion limited. In this formulation, one is led to a system of nine partial differential equations, of which five are, in fact ordinary differential equations in time.

Numerical Simulations of Tumor Growth
Qing Nie, Department of Mathematics, Center for Complex Biological Systems and Department of Biomedical Engineering, University of California, Irvine

In this talk, we study solid tumor growth using numerical simulations. The tumor evolution is described by a classical model of moving boundary with a reduced set of two new dimensionless parameters. One parameter describes the relative rate of mitosis to the relaxation mechanisms (cell-mobility and cell-to-cell adhesion). The other describes the balance between apoptosis (programmed cell-death) and mitosis. Our analysis and simulations reveal that the new dimensionless parameters divide tumor growth into three regimes associated with increasing degrees of vascularization. We demonstrate that critical conditions exist for which the tumor evolves to non-trivial dormant states or grows self-similarly. Away from these critical conditions, evolution may be unstable leading to invasive fingering and topological transitions such as the capture of healthy tissue by the tumor.

Modelling Growing Malignancy in Tumours and Applications to Diagnosis
Kevin Painter, Department of Mathematics, Heriot-Watt University

In this talk I will describe a mathematical model describing the growth and evolution of a tumour via a sequence of cellular mutations. Our model demonstrates how mutation leads to a highly heterogeneous and malignant tumour. We discuss the model in light of the development of astrocytic tumours of the brain and employ the model to understand the effectiveness of biopsy sampling.

Using Support Vector Machines for Analysis of Gene Expression Data from DNA Microarrays
Andrzej Swierniak, Silesian University of Technology

DNA microarrays (biochips) are a new tool that biologists can use to obtain information about expression levels of thousands of genes simultaneously. Their main advantages are: reproducibility and scalability of obtained data, short time of one experiment and, of course, the large number of genes, the expression of which is measured. The technique of producing DNA microarrays is improving continuously.

In general, there are two different types of DNA microarrays: spotted microarrays and oligonucleotide microarrays. There are several important differences between these two types of microarrays. One of them is the technology of the production. While spotted microarrays are obtained by using special spotting robots, oligonucleotide microarrays are synthetized, often using photolitographic technology - the same as used during production of computer chips.

There are many ways of exploiting a data from microarrays. One of the most frequently used is the classification of samples belonging to different classes. Such classification can be applied for example in medical diagnosis and choosing proper medical therapy. One of the first paper dealing with classification was the article by Golub et al. (1999). In this paper samples of two types: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) were classified and clusterized. For classification purposes the authors proposed so called weighted voting (WV) algorithm. The AML/ALL data set (available via Internet) was used by other scientists for testing different methods of analysis. For example, the same data set has been used for testing a more traditional perceptron algorithm in (Fujarewicz et al., 2000, 2001). Obtained results were slightly better than using WV algorithm. In (Furey et al., 2000) relatively new and promising method of classification and regression called support vector machines (Boser et al., 1992; Vapnik, 1995; Christianini et al., 2000) has been applied to the same data set. In (Brown et al., 2000) the SVM technique has been tested on another microarray data set. Moreover, in this work SVM have been compared to other methods like: decision trees, Parzen windows, Fisher's linear discriminant and the conclusion was that SVM significantly outperformed all other investigated method. Therefore, the SVM technique can be regarded as a very promising supervised learning tool dealing with microarray gene expression data.

Choosing proper learning and classification method is a final and very important element in the recognition process when dealing with gene expression data. However there are other earlier stages of data processing, which are also very important because of their significant influence on classification quality. One of these elements is the gene selection. In (Golub et al., 1999) the method called neighborhood analysis (NA) has been used while in (Fujarewicz et al., 2000, 2001) the Sebestyen criterion (1962) modified by Deuser (1971) has been applied. In both methods a performance index evaluating discriminant ability is calculated separately for each gene. After this, a set of n genes with the highest index value is chosen for learning and classification purposes. Such approach seems reasonable. However, it may not be the best way of choosing a working gene set. This is due to the fact that expression levels of different genes are strongly correlated and univariate approach to the problem is not the best way. On the other hand, in case of microarray gene expression data, a naive approach to the problem by checking all subsets of thousands of genes is impossible due to the high computational cost.

Recently several new multivariate methods of choosing optimal (or suboptimal) gene subset have been proposed. Szabo et al. (2002) proposed a method that uses so called v-fold cross-validation combined with arbitrary chosen method of feature selection. In an approach formulated in (Chilingaryan et al., 2002) the Mahalanobis distance between vectors of gene expression is used to iterative improvement of actual gene subset. Another algorithm, combining genetic algorithms with k-nearest neighbor, has been proposed by Li et al. (2001). In (Fujarewicz et al., 2003) a new method, called recursive feature replacement (RFR) for gene selection has been proposed. The RFR method uses SVM technique and iteratively optimizes the leave-one-out cross-validation error. The comparison of the RFR method to other algorithms such as: NA algorithm and proposed in papers (Szabo et al., 2002) and (Chilingaryan et al., 2002) showed a supremacy of the RFR method.

Recently a new method for gene selection, also based on SVM, has been published in (Guyon et al., 2002). The method, called recursive feature elimination (RFE), also outperformed other investigated methods.

One of benchmark data sets, frequently used for testing different methods of gene expression data processing, is the tumor/normal colon data set. This data set was presented and analyzed (clustered) in the paper (Alon et al., 1999). Expression levels of about 6500 genes were measured for 62 samples: 40 tumor and 22 normal colon tissues. 2000 of them were selected by the authors for clustering/classification purposes. The main result of the paper (Alon et al., 1999) was the clustering experiment of the data. The data was grouped into two clusters with 8 wrong assignments: three normal tissues were assigned to the "tumor"cluster and five tumor tissues were assigned to the "normal"cluster. In (Furey et al., 2000) the SVM technique was used to classify the same data set. The classification was performed twice: for whole data set (2000 genes) and for top 1000 genes. In both cases the result of leave-one-out cross-validation was six misclassifications (3 tumor and 3 normal). Nguyen et al. (2002) tested on the colon data set two methods of data selection: principal component analysis (PCA) and partial least squares (PLS) and two methods of classification: logistic discrimination (LD) and quadratic discriminant analysis (QDA). Best resultswere obtained after applying LD classification to first 50 and 100 components (linear combinations of gene expression vectors) given by PLS method. Unfortunately there were still four misclassifications obtained in leave-one-out cross-validation. In this talk we compare RFR, RFE, NA and pure Sebestyen methods to the tumor/normal colon and thyroid data sets. The comparison of obtained results shows that RFR method finds the smallest gene subset that gives no misclassifications in leave-one-out cross-validation.

The authors were supported by Polish Scientific Committee(KBN) research project PBZ/KBN/040/P04/2001 in years 2002-2003 and by NATO Collaborative Linkage Grant LST.CLG.977845.

References

  1. Alon, U., Barai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., et al. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci., 96, 6745-6750.
  2. Boser, B. E., Guyon, I.M., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA
  3. Brown, M. P .S.., Groundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., et al. (2000). Knowledge based analysis of microarray gene expression data by using support vector machines. Proc. of the National Academy of Sciences, 97(1), 262-267.
  4. Chilingaryan, A., Gevorgyan, N., Vardanyan, A., Jones, D., & Szabo, A. (2002). A multivariate approach for selecting sets of differentially expressed genes. Mathematical Biosciences, 176, 59-69.
  5. Christianini, N. & Shawe-Tylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press.
  6. Deuser, L. M. (1971). A hybrid multispectral feature selection criterion. IEEE Trans. on Comp., 1116-1117.
  7. Fujarewicz, K. & Rzeszowska-Wolny, J. (2000). Cancer classification based on gene expression data. Journal of Medical Informatics and Technologies, 5, BI23-BI27.
  8. Fujarewicz, K. & Rzeszowska-Wolny, J. (2001). Neural network approach to cancer classification based on gene expression levels. Proceedings IASTED Int. Conf. Modelling Identification and Control, Innsbruck, 564-568.
  9. Fujarewicz, K., Kimmel, M., Rzeszowska-Wolny, J., & Swierniak, A. (2003). A note on classification of gene expression data using support vector machines. Journal of Biological Systems, 11(1), 43-56.
  10. Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10), 906-914.
  11. Golub, T. R., Slonim, T.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.
  12. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (1999). Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 64, 389-422.
  13. Li, L., Weinberg, C.R., Darden, T.A., & Pedersen, L.G. (2001). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 17, 1131-1142.
  14. Nguyen, D.V. & Rocke, D.M. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18(1), 39-50.
  15. Sebestyen, G. S. (1962). Decision making processes in pattern recognition. New York, NY: Macmillan.
  16. Szabo, A., Boucher, K., Carroll, W.L., Klebanov, L.B., Tsodikov, A.D., & Yakovlev, A.Y. (2002). Variable selection and pattern recognition with gene expression data generated by the microarray technology. Mathematical Biosciences, 176, 71-98.
  17. Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag.

The Evolution of a Tumor Cord Cell Population
Glenn Webb, Vanderbilt University

Tumour cords are cylindrical structures of malignant cells in the micro-architecture of certain vascularized tumours. These structures develop along the axis and radially outward from blood vessels, which supply oxygen and essential nutrients. A model of a tumor cord cell population is analyzed in which individual cells are distinguished by cell age and radial position. The model is a modification of a model proposed by A. Bertuzzi and A. Gandolfi (J. theoret. Biol. Vol.204, 2000). The existence and asymptotic behaviour of solutions are investigated. It is proved that solutions are asymptotically eventually periodic.