ISMB99
Dataset Generator for Whole Genome Shotgun Sequencing

Gene Myers
Celera Genomics
Rockville, MD 20850
E-mail: MyersGW@celera.com

Abstract
Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.

 

back to Schedule

-> ISMB 99