Machine learning for liver disease classification

In this work the use of machine learning in medicine, with a particular focus on liver disease, is investigated and summarised. A variety of machine learning techniques for feature selection and classification are then applied to a novel medical application. A dataset of healthy (20,089) and unhealthy (714) patients’ full blood count blood tests is analysed to further medical understanding of how liver disease affects the blood and to enable a new diagnosis technique based on commonly available information. Methods for outlier identification and robust classification are also introduced and evaluated.

Logistic regression and soft margin support vector machines are used to classify patients as healthy or unhealthy based on the blood tests. Feature selection is performed on the data. Three primary features (90% area under receiver operating characteristic curve accuracy) and four secondary features are found for the peak accuracy based on the 7-feature support vector machine classifier of 92 ± 0.5%. These features are verified by a liver specialist to be influenced by liver disease. The final classifier is further tested on a completely new dataset of 100,000 patients’ data and achieved 90% accuracy, marginally outperforming the classifier designed by a liver specialist.

Feature selection and classification tasks are performed on time cohorts to investigate temporal information in the data. Differences in features selected are found between blood tests taken near diagnosis and years prior. Classification accuracy is shown to decrease steadily as time prior to diagnosis increases. However, blood tests taken 6 years prior to diagnosis can still be dichotomised with greater than 75% accuracy.

An outlier rejecting support vector machine is developed and tested on artificial datasets and the portal hypertension dataset. The outlier rejection during training shows major improvements for small, well structured datasets but struggles to improve on soft margin support vector machines for larger, more complex datasets.

University of Southampton

Jesty, Benjamin

64b80ac0-4589-486c-8dbd-a7cd349c8633

May 2019

Jesty, Benjamin

64b80ac0-4589-486c-8dbd-a7cd349c8633

Niranjan, Mahesan

5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Jesty, Benjamin (2019) Machine learning for liver disease classification. University of Southampton, Masters Thesis, 66pp.

Record type: Thesis (Masters)