The University of Southampton
University of Southampton Institutional Repository

Optimal probe length varies for targets with high sequence variation: implications for probe library design for resequencing highly variable genes

Optimal probe length varies for targets with high sequence variation: implications for probe library design for resequencing highly variable genes
Optimal probe length varies for targets with high sequence variation: implications for probe library design for resequencing highly variable genes
Background

Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences.

Results

We designed overlapping probe libraries for a range of highly variable drug target genes based on known sequence information and develop a formal terminology to describe probe library design. We find that for some targets these libraries can provide good coverage of a previously unseen target whereas for others the coverage is less than 30%. The optimal probe length varies from as short at 12 nt to as large as 19 nt and depends on the sequence, its variability, and the stringency of thermal filtering. It cannot be determined from inspection of an example gene sequence.

Conclusions

Optimal probe length and the optimal number of reference sequences used to design a probe library are highly target specific for highly variable sequencing targets. The optimum design cannot be determined simply by inspection of input sequences or of alignments but only by detailed analysis of the each specific target. For highly variable sequences, shorter probes can in some cases provide better information than longer probes. Probe library design would benefit from a general purpose tool for analysing these issues. The formal terminology developed here and the analysis approaches it is used to describe will contribute to the development of such tools.
1932-6203
e2500
Haslam, Niall J.
4d7841cd-9f43-4c28-9643-60ab86e7061f
Whiteford, Nava E.
6240241f-627d-4435-8618-4eb0030db0df
Weber, Gerald
7cfc4eb7-a658-44fd-97e3-b3be79a6615f
Prügel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Essex, Jonathan W.
1f409cfe-6ba4-42e2-a0ab-a931826314b5
Neylon, Cameron
697f067b-db25-4c41-9618-28f4b74f73aa
Haslam, Niall J.
4d7841cd-9f43-4c28-9643-60ab86e7061f
Whiteford, Nava E.
6240241f-627d-4435-8618-4eb0030db0df
Weber, Gerald
7cfc4eb7-a658-44fd-97e3-b3be79a6615f
Prügel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Essex, Jonathan W.
1f409cfe-6ba4-42e2-a0ab-a931826314b5
Neylon, Cameron
697f067b-db25-4c41-9618-28f4b74f73aa

Haslam, Niall J., Whiteford, Nava E., Weber, Gerald, Prügel-Bennett, Adam, Essex, Jonathan W. and Neylon, Cameron (2008) Optimal probe length varies for targets with high sequence variation: implications for probe library design for resequencing highly variable genes. PLoS ONE, 3 (6), e2500. (doi:10.1371/journal.pone.0002500).

Record type: Article

Abstract

Background

Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences.

Results

We designed overlapping probe libraries for a range of highly variable drug target genes based on known sequence information and develop a formal terminology to describe probe library design. We find that for some targets these libraries can provide good coverage of a previously unseen target whereas for others the coverage is less than 30%. The optimal probe length varies from as short at 12 nt to as large as 19 nt and depends on the sequence, its variability, and the stringency of thermal filtering. It cannot be determined from inspection of an example gene sequence.

Conclusions

Optimal probe length and the optimal number of reference sequences used to design a probe library are highly target specific for highly variable sequencing targets. The optimum design cannot be determined simply by inspection of input sequences or of alignments but only by detailed analysis of the each specific target. For highly variable sequences, shorter probes can in some cases provide better information than longer probes. Probe library design would benefit from a general purpose tool for analysing these issues. The formal terminology developed here and the analysis approaches it is used to describe will contribute to the development of such tools.

Other
fetchObject.action_uri=info_doi%2F10.1371%2Fjournal.pone.0002500&representation=PDF - Version of Record
Available under License Other.
Download (283kB)

More information

Published date: 18 June 2008

Identifiers

Local EPrints ID: 149395
URI: http://eprints.soton.ac.uk/id/eprint/149395
ISSN: 1932-6203
PURE UUID: 52c6a89f-5f30-41ed-a476-78315acf4376
ORCID for Jonathan W. Essex: ORCID iD orcid.org/0000-0003-2639-2746

Catalogue record

Date deposited: 30 Apr 2010 08:40
Last modified: 20 Jul 2019 01:20

Export record

Altmetrics

Contributors

Author: Niall J. Haslam
Author: Nava E. Whiteford
Author: Gerald Weber
Author: Adam Prügel-Bennett
Author: Cameron Neylon

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×