Bioinformatic analysis of human Next Generation Sequencing data; extracting additional information, optimising mapping and variant calling, and application in a rare disease
Bioinformatic analysis of human Next Generation Sequencing data; extracting additional information, optimising mapping and variant calling, and application in a rare disease
With the increased application of Next Generation Sequencing (NGS) to medicine it is important to test and develop approaches to extract the optimum information from datasets. In this thesis five aspects of NGS are investigated ranging from quality control to variant calling. Firstly a method to estimate contamination from a VCF file was developed which would be useful in cases where no BAM file was available to use existing tools. Unmapped reads were investigated to extract additional information from NGS samples and were able to detect the abundance of oral microbes from saliva samples relative to blood collected samples, but failed to identify differences between inflammatory bowel disease patients and controls. For a familial trio with a reported rare case of Sedaghatian-type spondylometaphyseal dysplasia (SSMD) sequenced both by whole exome (WES) and genome (WGS) sequencing it was shown that nearly all coding variants from WES were called in WGS despite differences in mean depth of coverage. This comparison highlighted that as sequencing costs decrease WGS will offer the greatest diagnostic value with potential for future re-analysis of cases currently unable to be resolved. Using the familial trio attempts were made to identify causal variant(s) in the gene currently implicated in causing SSMD – Glutathione peroxidase 4 (GPX4 ). However no variants either small SNPs or large structural were identified over the GPX4 gene and no plausible candidates were identified from the trio. Finally variant calling of the FCGR low affinity locus was performed using targeted NGS. FCGR genes have been highly duplicated and so by using customised references it was possible to infer the combinations of alleles across homologous sites. Using this approach it was possible to predict SNPs in the FCGR3B gene and predict human neutrophil antigen haplotypes involved in the immune response to treatments such as monoclonal antibodies.
University of Southampton
Sood, Roshan, Kumar
5ca793d0-ce4a-48e6-a646-b1348ac4219f
May 2019
Sood, Roshan, Kumar
5ca793d0-ce4a-48e6-a646-b1348ac4219f
Gibson, Jane
855033a6-38f3-4853-8f60-d7d4561226ae
Sood, Roshan, Kumar
(2019)
Bioinformatic analysis of human Next Generation Sequencing data; extracting additional information, optimising mapping and variant calling, and application in a rare disease.
University of Southampton, Doctoral Thesis, 455pp.
Record type:
Thesis
(Doctoral)
Abstract
With the increased application of Next Generation Sequencing (NGS) to medicine it is important to test and develop approaches to extract the optimum information from datasets. In this thesis five aspects of NGS are investigated ranging from quality control to variant calling. Firstly a method to estimate contamination from a VCF file was developed which would be useful in cases where no BAM file was available to use existing tools. Unmapped reads were investigated to extract additional information from NGS samples and were able to detect the abundance of oral microbes from saliva samples relative to blood collected samples, but failed to identify differences between inflammatory bowel disease patients and controls. For a familial trio with a reported rare case of Sedaghatian-type spondylometaphyseal dysplasia (SSMD) sequenced both by whole exome (WES) and genome (WGS) sequencing it was shown that nearly all coding variants from WES were called in WGS despite differences in mean depth of coverage. This comparison highlighted that as sequencing costs decrease WGS will offer the greatest diagnostic value with potential for future re-analysis of cases currently unable to be resolved. Using the familial trio attempts were made to identify causal variant(s) in the gene currently implicated in causing SSMD – Glutathione peroxidase 4 (GPX4 ). However no variants either small SNPs or large structural were identified over the GPX4 gene and no plausible candidates were identified from the trio. Finally variant calling of the FCGR low affinity locus was performed using targeted NGS. FCGR genes have been highly duplicated and so by using customised references it was possible to infer the combinations of alleles across homologous sites. Using this approach it was possible to predict SNPs in the FCGR3B gene and predict human neutrophil antigen haplotypes involved in the immune response to treatments such as monoclonal antibodies.
Text
Roshan_Sood_FINAL thesis_2019
- Version of Record
More information
Published date: May 2019
Identifiers
Local EPrints ID: 433354
URI: http://eprints.soton.ac.uk/id/eprint/433354
PURE UUID: db435f82-7afa-4c61-ad34-e8a1a04c70c8
Catalogue record
Date deposited: 14 Aug 2019 16:31
Last modified: 16 Mar 2024 08:01
Export record
Contributors
Author:
Roshan, Kumar Sood
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics