Rapid, raw-read reference and identification (R4IDs): a flexible platform for rapid generic species ID using long-read sequencing technology

The versatility of the current DNA sequencing platforms and the development of portable, nanopore sequencers means that it has never been easier to collect genetic data for unknown sample ID. DNA barcoding and meta-barcoding have become increasingly popular and barcode databases continue to grow at an impressive rate. However, the number of canonical genome assemblies (reference or draft) that are publically available is relatively tiny, hindering the more widespread use of genome scale DNA sequencing technology for accurate species identification and discovery. Here, we show that rapid raw-read reference datasets, or R4IDs for short, generated in a matter of hours on the Oxford Nanopore MinION, can bridge this gap and accelerate the generation of useable reference sequence data. By exploiting the long read length of this technology, shotgun genomic sequencing of a small portion of an organism’s genome can act as a suitable reference database despite the low sequencing coverage. These R4IDs can then be used for accurate species identification with minimal amounts of re-sequencing effort (1000s of reads). We demonstrated the capabilities of this approach with six vascular plant species for which we created R4IDs in the laboratory and then re-sequenced, live at the Kew Science Festival 2016. We further validated our method using simulations to determine the broader applicability of the approach. Our data analysis pipeline has been made available as a Dockerised workflow for simple, scalable deployment for a range of uses.

10.1101/281048

bioRxiv

Parker, Joe

979fbb42-5897-4fbe-a32e-06793f9f99ed

Helmstetter, Andrew

f7d85b05-2c08-4a12-9793-dcaf60f45c73

Crowe, James

85cc0b43-5ec0-420f-8fa7-e6a717579876

Iacona, John

47d989af-26d8-4699-ad79-08e5699874ab

Devey, Dion

2f1d5dcf-71bb-4fb2-8e07-4c8ad3ac6ea0

Papadopulos, Alexander S.T.

16661763-8017-490a-9f5c-a040e4e36124