Machine learning for liver disease classification
Machine learning for liver disease classification
In this work the use of machine learning in medicine, with a particular focus on liver disease, is investigated and summarised. A variety of machine learning techniques for feature selection and classification are then applied to a novel medical application. A dataset of healthy (20,089) and unhealthy (714) patients’ full blood count blood tests is analysed to further medical understanding of how liver disease affects the blood and to enable a new diagnosis technique based on commonly available information. Methods for outlier identification and robust classification are also introduced and evaluated.
Logistic regression and soft margin support vector machines are used to classify patients as healthy or unhealthy based on the blood tests. Feature selection is performed on the data. Three primary features (90% area under receiver operating characteristic curve accuracy) and four secondary features are found for the peak accuracy based on the 7-feature support vector machine classifier of 92 ± 0.5%. These features are verified by a liver specialist to be influenced by liver disease. The final classifier is further tested on a completely new dataset of 100,000 patients’ data and achieved 90% accuracy, marginally outperforming the classifier designed by a liver specialist.
Feature selection and classification tasks are performed on time cohorts to investigate temporal information in the data. Differences in features selected are found between blood tests taken near diagnosis and years prior. Classification accuracy is shown to decrease steadily as time prior to diagnosis increases. However, blood tests taken 6 years prior to diagnosis can still be dichotomised with greater than 75% accuracy.
An outlier rejecting support vector machine is developed and tested on artificial datasets and the portal hypertension dataset. The outlier rejection during training shows major improvements for small, well structured datasets but struggles to improve on soft margin support vector machines for larger, more complex datasets.
University of Southampton
Jesty, Benjamin
64b80ac0-4589-486c-8dbd-a7cd349c8633
May 2019
Jesty, Benjamin
64b80ac0-4589-486c-8dbd-a7cd349c8633
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Jesty, Benjamin
(2019)
Machine learning for liver disease classification.
University of Southampton, Masters Thesis, 66pp.
Record type:
Thesis
(Masters)
Abstract
In this work the use of machine learning in medicine, with a particular focus on liver disease, is investigated and summarised. A variety of machine learning techniques for feature selection and classification are then applied to a novel medical application. A dataset of healthy (20,089) and unhealthy (714) patients’ full blood count blood tests is analysed to further medical understanding of how liver disease affects the blood and to enable a new diagnosis technique based on commonly available information. Methods for outlier identification and robust classification are also introduced and evaluated.
Logistic regression and soft margin support vector machines are used to classify patients as healthy or unhealthy based on the blood tests. Feature selection is performed on the data. Three primary features (90% area under receiver operating characteristic curve accuracy) and four secondary features are found for the peak accuracy based on the 7-feature support vector machine classifier of 92 ± 0.5%. These features are verified by a liver specialist to be influenced by liver disease. The final classifier is further tested on a completely new dataset of 100,000 patients’ data and achieved 90% accuracy, marginally outperforming the classifier designed by a liver specialist.
Feature selection and classification tasks are performed on time cohorts to investigate temporal information in the data. Differences in features selected are found between blood tests taken near diagnosis and years prior. Classification accuracy is shown to decrease steadily as time prior to diagnosis increases. However, blood tests taken 6 years prior to diagnosis can still be dichotomised with greater than 75% accuracy.
An outlier rejecting support vector machine is developed and tested on artificial datasets and the portal hypertension dataset. The outlier rejection during training shows major improvements for small, well structured datasets but struggles to improve on soft margin support vector machines for larger, more complex datasets.
Text
bj1g11_corrected_thesis
- Version of Record
More information
Published date: May 2019
Identifiers
Local EPrints ID: 433924
URI: http://eprints.soton.ac.uk/id/eprint/433924
PURE UUID: c9516348-1ff1-4578-98c9-26e25bcc88f1
Catalogue record
Date deposited: 06 Sep 2019 16:30
Last modified: 16 Mar 2024 03:56
Export record
Contributors
Author:
Benjamin Jesty
Thesis advisor:
Mahesan Niranjan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics