The University of Southampton
University of Southampton Institutional Repository

Machine learning for liver disease classification

Machine learning for liver disease classification
Machine learning for liver disease classification
In this work the use of machine learning in medicine, with a particular focus on liver disease, is investigated and summarised. A variety of machine learning techniques for feature selection and classification are then applied to a novel medical application. A dataset of healthy (20,089) and unhealthy (714) patients’ full blood count blood tests is analysed to further medical understanding of how liver disease affects the blood and to enable a new diagnosis technique based on commonly available information. Methods for outlier identification and robust classification are also introduced and evaluated.

Logistic regression and soft margin support vector machines are used to classify patients as healthy or unhealthy based on the blood tests. Feature selection is performed on the data. Three primary features (90% area under receiver operating characteristic curve accuracy) and four secondary features are found for the peak accuracy based on the 7-feature support vector machine classifier of 92 ± 0.5%. These features are verified by a liver specialist to be influenced by liver disease. The final classifier is further tested on a completely new dataset of 100,000 patients’ data and achieved 90% accuracy, marginally outperforming the classifier designed by a liver specialist.

Feature selection and classification tasks are performed on time cohorts to investigate temporal information in the data. Differences in features selected are found between blood tests taken near diagnosis and years prior. Classification accuracy is shown to decrease steadily as time prior to diagnosis increases. However, blood tests taken 6 years prior to diagnosis can still be dichotomised with greater than 75% accuracy.

An outlier rejecting support vector machine is developed and tested on artificial datasets and the portal hypertension dataset. The outlier rejection during training shows major improvements for small, well structured datasets but struggles to improve on soft margin support vector machines for larger, more complex datasets.
University of Southampton
Jesty, Benjamin
64b80ac0-4589-486c-8dbd-a7cd349c8633
Jesty, Benjamin
64b80ac0-4589-486c-8dbd-a7cd349c8633
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Jesty, Benjamin (2019) Machine learning for liver disease classification. University of Southampton, Masters Thesis, 66pp.

Record type: Thesis (Masters)

Abstract

In this work the use of machine learning in medicine, with a particular focus on liver disease, is investigated and summarised. A variety of machine learning techniques for feature selection and classification are then applied to a novel medical application. A dataset of healthy (20,089) and unhealthy (714) patients’ full blood count blood tests is analysed to further medical understanding of how liver disease affects the blood and to enable a new diagnosis technique based on commonly available information. Methods for outlier identification and robust classification are also introduced and evaluated.

Logistic regression and soft margin support vector machines are used to classify patients as healthy or unhealthy based on the blood tests. Feature selection is performed on the data. Three primary features (90% area under receiver operating characteristic curve accuracy) and four secondary features are found for the peak accuracy based on the 7-feature support vector machine classifier of 92 ± 0.5%. These features are verified by a liver specialist to be influenced by liver disease. The final classifier is further tested on a completely new dataset of 100,000 patients’ data and achieved 90% accuracy, marginally outperforming the classifier designed by a liver specialist.

Feature selection and classification tasks are performed on time cohorts to investigate temporal information in the data. Differences in features selected are found between blood tests taken near diagnosis and years prior. Classification accuracy is shown to decrease steadily as time prior to diagnosis increases. However, blood tests taken 6 years prior to diagnosis can still be dichotomised with greater than 75% accuracy.

An outlier rejecting support vector machine is developed and tested on artificial datasets and the portal hypertension dataset. The outlier rejection during training shows major improvements for small, well structured datasets but struggles to improve on soft margin support vector machines for larger, more complex datasets.

Text
bj1g11_corrected_thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (1MB)

More information

Published date: May 2019

Identifiers

Local EPrints ID: 433924
URI: http://eprints.soton.ac.uk/id/eprint/433924
PURE UUID: c9516348-1ff1-4578-98c9-26e25bcc88f1
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 06 Sep 2019 16:30
Last modified: 16 Mar 2024 03:56

Export record

Contributors

Author: Benjamin Jesty
Thesis advisor: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×