In the fall of 2001, a series of biowarfare attacks were sent through the U.S. mail as letters containing a powder form of the anthrax bacterium, Bacillus Anthracis. Subsequent to these attacks, our group was asked to conduct a rapid sequencing project to decode the genome of the strain used in the attacks. We compared this sequence to a second, reference anthrax genome, which we were sequencing simultaneously. I will discuss the difficult challenges posed by trying to compare two imcomplete genome sequences, for which the sequencing error rate is relatively high, and for which the assembly of the genome data may contain mistakes. Using a combination of computational and statistical methods, our group identified 60 novel, high-quality genetic markers distinguishing the attack strain from other strains (1). We also concluded that, in order to facilitate future comparisons at this level of detail, genome sequencing centers need to release not only genome sequences but also detailed information about the accuracy of every nucleotide in those sequences, and about the genome assembly itself.
I will discus the difficult challenges posed by trying to compare two imcomplete genome sequences, for which the sequencing error rate is relatively high,and for which the assembly of the genome data may contain mistakes. Using a combination of computational and statistical methods, our group identified 60 novel, high-quality genetic markers distinguishing the attack strain from other strains (1). We
also concluded that, in order to facilitate future comparisons at this level of detail, genome sequencing centers need to release not only genome sequences but also detailed information about the accuracy of every nucleotide in those sequences, and about the genome assembly itself.
If time permits, I will discuss a new project to sample and sequence the genomes of 10,000 or more isolates of the human influenze A virus, in order to anticipate and help prevent future flu pandemics.