Reading DNA Sequences Along Eulerian Paths

Michael Waterman
Biological Sciences and Mathematics, University of Southern California

(March 15, 2010 2:30 PM - 3:30 PM)

Reading DNA Sequences Along Eulerian Paths

Abstract

In 1975 when Fred Sanger was developing dideoxy termination DNA sequencing for which he was to receive his second Nobel Prize, he found Roger Staden who developed the first computer program to assemble longer DNA sequences from the reads. The reads were randomly located and oriented along the target DNA. Until recently all DNA sequence assembly programs were further sophisticated elaborations of Staden's original technique. They often consist of three major steps: compare all pairs of reads, find an approximate arrangement of the significant overlaps, and multiple alignment for this arrangement. Staden used a greedy version of this method. In 1995 an elegant and entirely new approach was proposed in which each read is broken down into shorter overlapping words, and then a certain graph is constructed so that Eulerian paths in this graph correspond to the target DNA sequence. This will show how this graph is constructed and give some examples of its operation. Today for new-generation sequencing, this Eulerian method is a method of choice.