String matching in DNA sequences : implications for short read sequencing and repeat visualisation
String matching in DNA sequences : implications for short read sequencing and repeat visualisation
Several methods for ultra-high throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). In this thesis the absolute upper limits of short read methods for de novo and resequencing applications are defined. This analysis shows that short read methods fare well in resequencing applications, but that when performing de novo sequencing on large genomes, methods not able to produce reads in excess of 50 nt (nucleotides) may encounter problems.
In addition to this, a number of realistic sequencing scenarios are examined through the development of a methodology for the benchmarking of sequence assemblies. It is found that many currently available sequence assemblers perform poorly when provided with short read data. It is found that increasing the read coverage can provide huge improvements in many cases. A repeat visualisation technique created as an extension of the resequencing feasibility analysis is also described. This visualisation highlights the complex repeat structure present in genomic sequences. In particular, striking differences can easily be seen in the repeat character of coding and noncoding regions as well as in features associated with pathogenicty in bacterial genomes.
University of Southampton
Whiteford, Nava
f084a58e-2c9d-496c-b960-77f4cd08a83b
2007
Whiteford, Nava
f084a58e-2c9d-496c-b960-77f4cd08a83b
Whiteford, Nava
(2007)
String matching in DNA sequences : implications for short read sequencing and repeat visualisation.
University of Southampton, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
Several methods for ultra-high throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). In this thesis the absolute upper limits of short read methods for de novo and resequencing applications are defined. This analysis shows that short read methods fare well in resequencing applications, but that when performing de novo sequencing on large genomes, methods not able to produce reads in excess of 50 nt (nucleotides) may encounter problems.
In addition to this, a number of realistic sequencing scenarios are examined through the development of a methodology for the benchmarking of sequence assemblies. It is found that many currently available sequence assemblers perform poorly when provided with short read data. It is found that increasing the read coverage can provide huge improvements in many cases. A repeat visualisation technique created as an extension of the resequencing feasibility analysis is also described. This visualisation highlights the complex repeat structure present in genomic sequences. In particular, striking differences can easily be seen in the repeat character of coding and noncoding regions as well as in features associated with pathogenicty in bacterial genomes.
This record has no associated files available for download.
More information
Published date: 2007
Identifiers
Local EPrints ID: 466177
URI: http://eprints.soton.ac.uk/id/eprint/466177
PURE UUID: 52a4e934-4217-48cc-9499-0fe43c53c93d
Catalogue record
Date deposited: 05 Jul 2022 04:38
Last modified: 05 Jul 2022 04:38
Export record
Contributors
Author:
Nava Whiteford
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics