Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology
Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology
The versatility of the current DNA sequencing platforms and the development of portable, nanopore sequencers means that it has never been easier to collect genetic data for unknown sample ID. DNA barcoding and meta-barcoding have become increasingly popular and barcode databases continue to grow at an impressive rate. However, the number of canonical genome assemblies (reference or draft) that are publically available is relatively tiny, hindering the more widespread use of genome scale DNA sequencing technology for accurate species identification and discovery. Here, we show that rapid raw-read reference datasets, or R4IDs for short, generated in a matter of hours on the Oxford Nanopore MinION, can bridge this gap and accelerate the generation of useable reference sequence data. By exploiting the long read length of this technology, shotgun genomic sequencing of a small portion of an organism’s genome can act as a suitable reference database despite the low sequencing coverage. These R4IDs can then be used for accurate species identification with minimal amounts of re-sequencing effort (1000s of reads). We demonstrated the capabilities of this approach with six vascular plant species for which we created R4IDs in the laboratory and then re-sequenced, live at the Kew Science Festival 2016. We further validated our method using simulations to determine the broader applicability of the approach. Our data analysis pipeline has been made available as a Dockerised workflow for simple, scalable deployment for a range of uses.
Parker, Joe
979fbb42-5897-4fbe-a32e-06793f9f99ed
Helmstetter, Andrew
f7d85b05-2c08-4a12-9793-dcaf60f45c73
Crowe, James
85cc0b43-5ec0-420f-8fa7-e6a717579876
Iacona, John
47d989af-26d8-4699-ad79-08e5699874ab
Devey, Dion
2f1d5dcf-71bb-4fb2-8e07-4c8ad3ac6ea0
Papadopulos, Alexander S.T.
16661763-8017-490a-9f5c-a040e4e36124
Parker, Joe
979fbb42-5897-4fbe-a32e-06793f9f99ed
Helmstetter, Andrew
f7d85b05-2c08-4a12-9793-dcaf60f45c73
Crowe, James
85cc0b43-5ec0-420f-8fa7-e6a717579876
Iacona, John
47d989af-26d8-4699-ad79-08e5699874ab
Devey, Dion
2f1d5dcf-71bb-4fb2-8e07-4c8ad3ac6ea0
Papadopulos, Alexander S.T.
16661763-8017-490a-9f5c-a040e4e36124
Parker, Joe, Helmstetter, Andrew, Crowe, James, Iacona, John, Devey, Dion and Papadopulos, Alexander S.T.
(2018)
Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology
bioRxiv
18pp.
(doi:10.1101/281048).
(Submitted)
Record type:
Monograph
(Working Paper)
Abstract
The versatility of the current DNA sequencing platforms and the development of portable, nanopore sequencers means that it has never been easier to collect genetic data for unknown sample ID. DNA barcoding and meta-barcoding have become increasingly popular and barcode databases continue to grow at an impressive rate. However, the number of canonical genome assemblies (reference or draft) that are publically available is relatively tiny, hindering the more widespread use of genome scale DNA sequencing technology for accurate species identification and discovery. Here, we show that rapid raw-read reference datasets, or R4IDs for short, generated in a matter of hours on the Oxford Nanopore MinION, can bridge this gap and accelerate the generation of useable reference sequence data. By exploiting the long read length of this technology, shotgun genomic sequencing of a small portion of an organism’s genome can act as a suitable reference database despite the low sequencing coverage. These R4IDs can then be used for accurate species identification with minimal amounts of re-sequencing effort (1000s of reads). We demonstrated the capabilities of this approach with six vascular plant species for which we created R4IDs in the laboratory and then re-sequenced, live at the Kew Science Festival 2016. We further validated our method using simulations to determine the broader applicability of the approach. Our data analysis pipeline has been made available as a Dockerised workflow for simple, scalable deployment for a range of uses.
Text
281048v1.full
- Author's Original
More information
Submitted date: 13 March 2018
Identifiers
Local EPrints ID: 480608
URI: http://eprints.soton.ac.uk/id/eprint/480608
PURE UUID: 2ddc60a6-cd11-406b-8894-77de48b41b77
Catalogue record
Date deposited: 07 Aug 2023 16:50
Last modified: 18 Mar 2024 03:50
Export record
Altmetrics
Contributors
Author:
Andrew Helmstetter
Author:
James Crowe
Author:
John Iacona
Author:
Dion Devey
Author:
Alexander S.T. Papadopulos
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics