The University of Southampton
University of Southampton Institutional Repository

String matching in DNA sequences : implications for short read sequencing and repeat visualisation

String matching in DNA sequences : implications for short read sequencing and repeat visualisation
String matching in DNA sequences : implications for short read sequencing and repeat visualisation

Several methods for ultra-high throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). In this thesis the absolute upper limits of short read methods for de novo and resequencing applications are defined. This analysis shows that short read methods fare well in resequencing applications, but that when performing de novo sequencing on large genomes, methods not able to produce reads in excess of 50 nt (nucleotides) may encounter problems.

In addition to this, a number of realistic sequencing scenarios are examined through the development of a methodology for the benchmarking of sequence assemblies. It is found that many currently available sequence assemblers perform poorly when provided with short read data. It is found that increasing the read coverage can provide huge improvements in many cases. A repeat visualisation technique created as an extension of the resequencing feasibility analysis is also described. This visualisation highlights the complex repeat structure present in genomic sequences. In particular, striking differences can easily be seen in the repeat character of coding and noncoding regions as well as in features associated with pathogenicty in bacterial genomes.

University of Southampton
Whiteford, Nava
f084a58e-2c9d-496c-b960-77f4cd08a83b
Whiteford, Nava
f084a58e-2c9d-496c-b960-77f4cd08a83b

Whiteford, Nava (2007) String matching in DNA sequences : implications for short read sequencing and repeat visualisation. University of Southampton, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

Several methods for ultra-high throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). In this thesis the absolute upper limits of short read methods for de novo and resequencing applications are defined. This analysis shows that short read methods fare well in resequencing applications, but that when performing de novo sequencing on large genomes, methods not able to produce reads in excess of 50 nt (nucleotides) may encounter problems.

In addition to this, a number of realistic sequencing scenarios are examined through the development of a methodology for the benchmarking of sequence assemblies. It is found that many currently available sequence assemblers perform poorly when provided with short read data. It is found that increasing the read coverage can provide huge improvements in many cases. A repeat visualisation technique created as an extension of the resequencing feasibility analysis is also described. This visualisation highlights the complex repeat structure present in genomic sequences. In particular, striking differences can easily be seen in the repeat character of coding and noncoding regions as well as in features associated with pathogenicty in bacterial genomes.

This record has no associated files available for download.

More information

Published date: 2007

Identifiers

Local EPrints ID: 466177
URI: http://eprints.soton.ac.uk/id/eprint/466177
PURE UUID: 52a4e934-4217-48cc-9499-0fe43c53c93d

Catalogue record

Date deposited: 05 Jul 2022 04:38
Last modified: 05 Jul 2022 04:38

Export record

Contributors

Author: Nava Whiteford

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×