The University of Southampton
University of Southampton Institutional Repository

The challenge of genome sequence assembly

The challenge of genome sequence assembly
The challenge of genome sequence assembly

Background: Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs. Objective: Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs. Results and Conclusion: A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.

Chromosome assembly, Cross-species synteny, Earth biogenome project, Linkage disequilibrium map, Sequence contigs, Whole genome sequencing
1875-0362
231-239
Collins, Andrew
7daa83eb-0b21-43b2-af1a-e38fb36e2a64
Collins, Andrew
7daa83eb-0b21-43b2-af1a-e38fb36e2a64

Collins, Andrew (2018) The challenge of genome sequence assembly. Open Bioinformatics Journal, 11 (1), 231-239. (doi:10.2174/1875036201811010231).

Record type: Review

Abstract

Background: Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs. Objective: Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs. Results and Conclusion: A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.

Text
TOBIOIJ-11-231 - Version of Record
Available under License Creative Commons Attribution.
Download (251kB)

More information

Accepted/In Press date: 17 September 2018
e-pub ahead of print date: 17 October 2018
Published date: 2018
Keywords: Chromosome assembly, Cross-species synteny, Earth biogenome project, Linkage disequilibrium map, Sequence contigs, Whole genome sequencing

Identifiers

Local EPrints ID: 426069
URI: http://eprints.soton.ac.uk/id/eprint/426069
ISSN: 1875-0362
PURE UUID: 1c95c475-1fa3-475b-a541-fabb01fe5014
ORCID for Andrew Collins: ORCID iD orcid.org/0000-0001-7108-0771

Catalogue record

Date deposited: 13 Nov 2018 17:30
Last modified: 16 Mar 2024 02:42

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×