Linkage disequilibrium maps to guide contig ordering for genome assembly
Linkage disequilibrium maps to guide contig ordering for genome assembly
Motivation: Efforts to establish reference genome sequences by \textit{de novo} sequence assembly have to address the difficulty of linking relatively short sequence contigs to form much larger chromosome assemblies. Efficient strategies are required to span gaps and establish contig order and relative orientation. We consider here the use of linkage disequilibrium (LD) maps of sequenced contigs and the utility of LD for ordering, orienting and positioning linked sequences. LD maps are readily constructed from population data and have at least an order of magnitude higher resolution than linkage maps providing the potential to resolve difficult areas in assemblies. We empirically evaluate a linkage disequilibrium map-based method using single nucleotide polymorphism genotype data in a ~216 kilobase region of human 6p21.3 from which three shorter contigs are formed.
Results: LD map length is most informative about the correct order and orientation and is suggested by the shortest LD map where the residual error variance is close to one. For regions in strong LD this method may be less informative for correcting inverted contigs than for identifying correct contig orders. For positioning two contigs in linkage disequilibrium with each other the inter-contig distances may be roughly estimated by this method.
Availability: The LDMAP program is written in C for a linux platform and is available at https://www.soton.ac.uk/genomicinformatics/research/ld.page
541-545
Pengelly, Reuben
af97c0c1-b568-415c-9f59-1823b65be76d
Collins, Andrew
7daa83eb-0b21-43b2-af1a-e38fb36e2a64
15 February 2019
Pengelly, Reuben
af97c0c1-b568-415c-9f59-1823b65be76d
Collins, Andrew
7daa83eb-0b21-43b2-af1a-e38fb36e2a64
Pengelly, Reuben and Collins, Andrew
(2019)
Linkage disequilibrium maps to guide contig ordering for genome assembly.
Bioinformatics, 35 (4), .
(doi:10.1093/bioinformatics/bty687).
Abstract
Motivation: Efforts to establish reference genome sequences by \textit{de novo} sequence assembly have to address the difficulty of linking relatively short sequence contigs to form much larger chromosome assemblies. Efficient strategies are required to span gaps and establish contig order and relative orientation. We consider here the use of linkage disequilibrium (LD) maps of sequenced contigs and the utility of LD for ordering, orienting and positioning linked sequences. LD maps are readily constructed from population data and have at least an order of magnitude higher resolution than linkage maps providing the potential to resolve difficult areas in assemblies. We empirically evaluate a linkage disequilibrium map-based method using single nucleotide polymorphism genotype data in a ~216 kilobase region of human 6p21.3 from which three shorter contigs are formed.
Results: LD map length is most informative about the correct order and orientation and is suggested by the shortest LD map where the residual error variance is close to one. For regions in strong LD this method may be less informative for correcting inverted contigs than for identifying correct contig orders. For positioning two contigs in linkage disequilibrium with each other the inter-contig distances may be roughly estimated by this method.
Availability: The LDMAP program is written in C for a linux platform and is available at https://www.soton.ac.uk/genomicinformatics/research/ld.page
Text
Pengelly&Collins_ld-contig-assembly
- Accepted Manuscript
More information
Accepted/In Press date: 3 August 2018
e-pub ahead of print date: 7 August 2018
Published date: 15 February 2019
Identifiers
Local EPrints ID: 422870
URI: http://eprints.soton.ac.uk/id/eprint/422870
ISSN: 1367-4803
PURE UUID: a598947d-0f3b-4ea4-84b9-6932cf761cff
Catalogue record
Date deposited: 07 Aug 2018 16:30
Last modified: 16 Mar 2024 06:57
Export record
Altmetrics
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics