Big data driven oriented graph theory aided tagSNPs selection for genetic precision therapy
Big data driven oriented graph theory aided tagSNPs selection for genetic precision therapy
Recently, the world-wide human genome-related projects have been vigorously launched and implemented. Gene-sequencing techniques play a critical role in disease diagnosis, prediction, and population stratification relying on efficiently mining genetic features in the gene pool. Exploring the association between the sites of the genetic mutation and the disease-based population classification becomes a hot topic, which beneficially supports disease diagnosis and treatment on the molecular level. However, there are numerous variable sites even on a single chromosome in the human gene pool, and hence, the traditional classifiers are not able to dig out all single nucleotide polymorphism (SNP) sites without clearly excavating the characteristic SNP sites, termed tagSNPs, in SNP clusters. By applying big data mining techniques, in this paper, we, first of all, propose a principal component analysis-based algorithm for reducing the gene data dimension in order to cluster SNP sites in the low-dimensional space. Moreover, an oriented graph theory-based tagSNPs selection algorithm is designed. Finally, relying on the real-world 1000 Genomes Project dataset, we can achieve fewer tagSNPs than the traditional methods by invoking the complete process of our designed SNP classifier.
big data, data dimension reduction, Genetic feature mining, SNP site clustering
3746-3754
Cong, Tianshuo
7fce9680-0d07-4371-9e91-5483cf3bfbee
Wang, Jingjing
0b73e219-9dd7-44ec-a260-a53ee004746f
Guan, Sanghai
502cf484-a402-4014-85c2-7afd239a9fe9
Mu, Yifei
9e54b67d-62c0-4da8-8bf8-b3fcc68fbfde
Bai, Tong
15e00a16-2ade-4fdb-a4d9-a490a526669a
Ren, Yong
ad146a10-75d8-401c-911b-fd4dcc44eb12
Cong, Tianshuo
7fce9680-0d07-4371-9e91-5483cf3bfbee
Wang, Jingjing
0b73e219-9dd7-44ec-a260-a53ee004746f
Guan, Sanghai
502cf484-a402-4014-85c2-7afd239a9fe9
Mu, Yifei
9e54b67d-62c0-4da8-8bf8-b3fcc68fbfde
Bai, Tong
15e00a16-2ade-4fdb-a4d9-a490a526669a
Ren, Yong
ad146a10-75d8-401c-911b-fd4dcc44eb12
Cong, Tianshuo, Wang, Jingjing, Guan, Sanghai, Mu, Yifei, Bai, Tong and Ren, Yong
(2018)
Big data driven oriented graph theory aided tagSNPs selection for genetic precision therapy.
IEEE Access, 7, , [8576526].
(doi:10.1109/ACCESS.2018.2886926).
Abstract
Recently, the world-wide human genome-related projects have been vigorously launched and implemented. Gene-sequencing techniques play a critical role in disease diagnosis, prediction, and population stratification relying on efficiently mining genetic features in the gene pool. Exploring the association between the sites of the genetic mutation and the disease-based population classification becomes a hot topic, which beneficially supports disease diagnosis and treatment on the molecular level. However, there are numerous variable sites even on a single chromosome in the human gene pool, and hence, the traditional classifiers are not able to dig out all single nucleotide polymorphism (SNP) sites without clearly excavating the characteristic SNP sites, termed tagSNPs, in SNP clusters. By applying big data mining techniques, in this paper, we, first of all, propose a principal component analysis-based algorithm for reducing the gene data dimension in order to cluster SNP sites in the low-dimensional space. Moreover, an oriented graph theory-based tagSNPs selection algorithm is designed. Finally, relying on the real-world 1000 Genomes Project dataset, we can achieve fewer tagSNPs than the traditional methods by invoking the complete process of our designed SNP classifier.
Text
08576526
- Version of Record
More information
Accepted/In Press date: 10 December 2018
e-pub ahead of print date: 14 December 2018
Keywords:
big data, data dimension reduction, Genetic feature mining, SNP site clustering
Identifiers
Local EPrints ID: 427621
URI: http://eprints.soton.ac.uk/id/eprint/427621
ISSN: 2169-3536
PURE UUID: fe514d6c-e684-4f95-8ade-cb8cd96f6558
Catalogue record
Date deposited: 24 Jan 2019 17:30
Last modified: 05 Jun 2024 19:07
Export record
Altmetrics
Contributors
Author:
Tianshuo Cong
Author:
Jingjing Wang
Author:
Sanghai Guan
Author:
Yifei Mu
Author:
Tong Bai
Author:
Yong Ren
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics