The University of Southampton
University of Southampton Institutional Repository

Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology

Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology
Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology
The versatility of the current DNA sequencing platforms and the development of portable, nanopore sequencers means that it has never been easier to collect genetic data for unknown sample ID. DNA barcoding and meta-barcoding have become increasingly popular and barcode databases continue to grow at an impressive rate. However, the number of canonical genome assemblies (reference or draft) that are publically available is relatively tiny, hindering the more widespread use of genome scale DNA sequencing technology for accurate species identification and discovery. Here, we show that rapid raw-read reference datasets, or R4IDs for short, generated in a matter of hours on the Oxford Nanopore MinION, can bridge this gap and accelerate the generation of useable reference sequence data. By exploiting the long read length of this technology, shotgun genomic sequencing of a small portion of an organism’s genome can act as a suitable reference database despite the low sequencing coverage. These R4IDs can then be used for accurate species identification with minimal amounts of re-sequencing effort (1000s of reads). We demonstrated the capabilities of this approach with six vascular plant species for which we created R4IDs in the laboratory and then re-sequenced, live at the Kew Science Festival 2016. We further validated our method using simulations to determine the broader applicability of the approach. Our data analysis pipeline has been made available as a Dockerised workflow for simple, scalable deployment for a range of uses.
bioRxiv
Parker, Joe
979fbb42-5897-4fbe-a32e-06793f9f99ed
Helmstetter, Andrew
f7d85b05-2c08-4a12-9793-dcaf60f45c73
Crowe, James
85cc0b43-5ec0-420f-8fa7-e6a717579876
Iacona, John
47d989af-26d8-4699-ad79-08e5699874ab
Devey, Dion
2f1d5dcf-71bb-4fb2-8e07-4c8ad3ac6ea0
Papadopulos, Alexander S.T.
16661763-8017-490a-9f5c-a040e4e36124
Parker, Joe
979fbb42-5897-4fbe-a32e-06793f9f99ed
Helmstetter, Andrew
f7d85b05-2c08-4a12-9793-dcaf60f45c73
Crowe, James
85cc0b43-5ec0-420f-8fa7-e6a717579876
Iacona, John
47d989af-26d8-4699-ad79-08e5699874ab
Devey, Dion
2f1d5dcf-71bb-4fb2-8e07-4c8ad3ac6ea0
Papadopulos, Alexander S.T.
16661763-8017-490a-9f5c-a040e4e36124

Parker, Joe, Helmstetter, Andrew, Crowe, James, Iacona, John, Devey, Dion and Papadopulos, Alexander S.T. (2018) Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology bioRxiv 18pp. (doi:10.1101/281048). (Submitted)

Record type: Monograph (Working Paper)

Abstract

The versatility of the current DNA sequencing platforms and the development of portable, nanopore sequencers means that it has never been easier to collect genetic data for unknown sample ID. DNA barcoding and meta-barcoding have become increasingly popular and barcode databases continue to grow at an impressive rate. However, the number of canonical genome assemblies (reference or draft) that are publically available is relatively tiny, hindering the more widespread use of genome scale DNA sequencing technology for accurate species identification and discovery. Here, we show that rapid raw-read reference datasets, or R4IDs for short, generated in a matter of hours on the Oxford Nanopore MinION, can bridge this gap and accelerate the generation of useable reference sequence data. By exploiting the long read length of this technology, shotgun genomic sequencing of a small portion of an organism’s genome can act as a suitable reference database despite the low sequencing coverage. These R4IDs can then be used for accurate species identification with minimal amounts of re-sequencing effort (1000s of reads). We demonstrated the capabilities of this approach with six vascular plant species for which we created R4IDs in the laboratory and then re-sequenced, live at the Kew Science Festival 2016. We further validated our method using simulations to determine the broader applicability of the approach. Our data analysis pipeline has been made available as a Dockerised workflow for simple, scalable deployment for a range of uses.

Text
281048v1.full - Author's Original
Download (407kB)

More information

Submitted date: 13 March 2018

Identifiers

Local EPrints ID: 480608
URI: http://eprints.soton.ac.uk/id/eprint/480608
PURE UUID: 2ddc60a6-cd11-406b-8894-77de48b41b77
ORCID for Joe Parker: ORCID iD orcid.org/0000-0003-3777-2269

Catalogue record

Date deposited: 07 Aug 2023 16:50
Last modified: 18 Mar 2024 03:50

Export record

Altmetrics

Contributors

Author: Joe Parker ORCID iD
Author: Andrew Helmstetter
Author: James Crowe
Author: John Iacona
Author: Dion Devey
Author: Alexander S.T. Papadopulos

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×