The University of Southampton
University of Southampton Institutional Repository
Warning ePrints Soton is experiencing an issue with some file downloads not being available. We are working hard to fix this. Please bear with us.

Linkage disequilibrium maps to guide contig ordering for genome assembly

Linkage disequilibrium maps to guide contig ordering for genome assembly
Linkage disequilibrium maps to guide contig ordering for genome assembly
Motivation: Efforts to establish reference genome sequences by \textit{de novo} sequence assembly have to address the difficulty of linking relatively short sequence contigs to form much larger chromosome assemblies. Efficient strategies are required to span gaps and establish contig order and relative orientation. We consider here the use of linkage disequilibrium (LD) maps of sequenced contigs and the utility of LD for ordering, orienting and positioning linked sequences. LD maps are readily constructed from population data and have at least an order of magnitude higher resolution than linkage maps providing the potential to resolve difficult areas in assemblies. We empirically evaluate a linkage disequilibrium map-based method using single nucleotide polymorphism genotype data in a ~216 kilobase region of human 6p21.3 from which three shorter contigs are formed.

Results: LD map length is most informative about the correct order and orientation and is suggested by the shortest LD map where the residual error variance is close to one. For regions in strong LD this method may be less informative for correcting inverted contigs than for identifying correct contig orders. For positioning two contigs in linkage disequilibrium with each other the inter-contig distances may be roughly estimated by this method.

Availability: The LDMAP program is written in C for a linux platform and is available at https://www.soton.ac.uk/genomicinformatics/research/ld.page
1367-4803
541-545
Pengelly, Reuben
af97c0c1-b568-415c-9f59-1823b65be76d
Collins, Andrew
7daa83eb-0b21-43b2-af1a-e38fb36e2a64
Pengelly, Reuben
af97c0c1-b568-415c-9f59-1823b65be76d
Collins, Andrew
7daa83eb-0b21-43b2-af1a-e38fb36e2a64

Pengelly, Reuben and Collins, Andrew (2019) Linkage disequilibrium maps to guide contig ordering for genome assembly. Bioinformatics, 35 (4), 541-545. (doi:10.1093/bioinformatics/bty687).

Record type: Article

Abstract

Motivation: Efforts to establish reference genome sequences by \textit{de novo} sequence assembly have to address the difficulty of linking relatively short sequence contigs to form much larger chromosome assemblies. Efficient strategies are required to span gaps and establish contig order and relative orientation. We consider here the use of linkage disequilibrium (LD) maps of sequenced contigs and the utility of LD for ordering, orienting and positioning linked sequences. LD maps are readily constructed from population data and have at least an order of magnitude higher resolution than linkage maps providing the potential to resolve difficult areas in assemblies. We empirically evaluate a linkage disequilibrium map-based method using single nucleotide polymorphism genotype data in a ~216 kilobase region of human 6p21.3 from which three shorter contigs are formed.

Results: LD map length is most informative about the correct order and orientation and is suggested by the shortest LD map where the residual error variance is close to one. For regions in strong LD this method may be less informative for correcting inverted contigs than for identifying correct contig orders. For positioning two contigs in linkage disequilibrium with each other the inter-contig distances may be roughly estimated by this method.

Availability: The LDMAP program is written in C for a linux platform and is available at https://www.soton.ac.uk/genomicinformatics/research/ld.page

Text
Pengelly&Collins_ld-contig-assembly - Accepted Manuscript
Download (243kB)

More information

Accepted/In Press date: 3 August 2018
e-pub ahead of print date: 7 August 2018
Published date: 15 February 2019

Identifiers

Local EPrints ID: 422870
URI: http://eprints.soton.ac.uk/id/eprint/422870
ISSN: 1367-4803
PURE UUID: a598947d-0f3b-4ea4-84b9-6932cf761cff
ORCID for Reuben Pengelly: ORCID iD orcid.org/0000-0001-7022-645X
ORCID for Andrew Collins: ORCID iD orcid.org/0000-0001-7108-0771

Catalogue record

Date deposited: 07 Aug 2018 16:30
Last modified: 26 Nov 2021 05:29

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×