The University of Southampton
University of Southampton Institutional Repository

Big data driven oriented graph theory aided tagSNPs selection for genetic precision therapy

Big data driven oriented graph theory aided tagSNPs selection for genetic precision therapy
Big data driven oriented graph theory aided tagSNPs selection for genetic precision therapy

Recently, the world-wide human genome-related projects have been vigorously launched and implemented. Gene-sequencing techniques play a critical role in disease diagnosis, prediction, and population stratification relying on efficiently mining genetic features in the gene pool. Exploring the association between the sites of the genetic mutation and the disease-based population classification becomes a hot topic, which beneficially supports disease diagnosis and treatment on the molecular level. However, there are numerous variable sites even on a single chromosome in the human gene pool, and hence, the traditional classifiers are not able to dig out all single nucleotide polymorphism (SNP) sites without clearly excavating the characteristic SNP sites, termed tagSNPs, in SNP clusters. By applying big data mining techniques, in this paper, we, first of all, propose a principal component analysis-based algorithm for reducing the gene data dimension in order to cluster SNP sites in the low-dimensional space. Moreover, an oriented graph theory-based tagSNPs selection algorithm is designed. Finally, relying on the real-world 1000 Genomes Project dataset, we can achieve fewer tagSNPs than the traditional methods by invoking the complete process of our designed SNP classifier.

big data, data dimension reduction, Genetic feature mining, SNP site clustering
2169-3536
3746-3754
Cong, Tianshuo
7fce9680-0d07-4371-9e91-5483cf3bfbee
Wang, Jingjing
0b73e219-9dd7-44ec-a260-a53ee004746f
Guan, Sanghai
502cf484-a402-4014-85c2-7afd239a9fe9
Mu, Yifei
9e54b67d-62c0-4da8-8bf8-b3fcc68fbfde
Bai, Tong
15e00a16-2ade-4fdb-a4d9-a490a526669a
Ren, Yong
ad146a10-75d8-401c-911b-fd4dcc44eb12
Cong, Tianshuo
7fce9680-0d07-4371-9e91-5483cf3bfbee
Wang, Jingjing
0b73e219-9dd7-44ec-a260-a53ee004746f
Guan, Sanghai
502cf484-a402-4014-85c2-7afd239a9fe9
Mu, Yifei
9e54b67d-62c0-4da8-8bf8-b3fcc68fbfde
Bai, Tong
15e00a16-2ade-4fdb-a4d9-a490a526669a
Ren, Yong
ad146a10-75d8-401c-911b-fd4dcc44eb12

Cong, Tianshuo, Wang, Jingjing, Guan, Sanghai, Mu, Yifei, Bai, Tong and Ren, Yong (2018) Big data driven oriented graph theory aided tagSNPs selection for genetic precision therapy. IEEE Access, 7, 3746-3754, [8576526]. (doi:10.1109/ACCESS.2018.2886926).

Record type: Article

Abstract

Recently, the world-wide human genome-related projects have been vigorously launched and implemented. Gene-sequencing techniques play a critical role in disease diagnosis, prediction, and population stratification relying on efficiently mining genetic features in the gene pool. Exploring the association between the sites of the genetic mutation and the disease-based population classification becomes a hot topic, which beneficially supports disease diagnosis and treatment on the molecular level. However, there are numerous variable sites even on a single chromosome in the human gene pool, and hence, the traditional classifiers are not able to dig out all single nucleotide polymorphism (SNP) sites without clearly excavating the characteristic SNP sites, termed tagSNPs, in SNP clusters. By applying big data mining techniques, in this paper, we, first of all, propose a principal component analysis-based algorithm for reducing the gene data dimension in order to cluster SNP sites in the low-dimensional space. Moreover, an oriented graph theory-based tagSNPs selection algorithm is designed. Finally, relying on the real-world 1000 Genomes Project dataset, we can achieve fewer tagSNPs than the traditional methods by invoking the complete process of our designed SNP classifier.

Text
08576526 - Version of Record
Available under License Creative Commons Attribution.
Download (12MB)

More information

Accepted/In Press date: 10 December 2018
e-pub ahead of print date: 14 December 2018
Keywords: big data, data dimension reduction, Genetic feature mining, SNP site clustering

Identifiers

Local EPrints ID: 427621
URI: http://eprints.soton.ac.uk/id/eprint/427621
ISSN: 2169-3536
PURE UUID: fe514d6c-e684-4f95-8ade-cb8cd96f6558

Catalogue record

Date deposited: 24 Jan 2019 17:30
Last modified: 07 Oct 2020 00:36

Export record

Altmetrics

Contributors

Author: Tianshuo Cong
Author: Jingjing Wang
Author: Sanghai Guan
Author: Yifei Mu
Author: Tong Bai
Author: Yong Ren

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×