The University of Southampton
University of Southampton Institutional Repository

Machine learning algorithms based on proteomic data mining accurately predicting the recurrence of hepatitis B-related hepatocellular carcinoma

Machine learning algorithms based on proteomic data mining accurately predicting the recurrence of hepatitis B-related hepatocellular carcinoma
Machine learning algorithms based on proteomic data mining accurately predicting the recurrence of hepatitis B-related hepatocellular carcinoma
Background and Aim
Over 10% of hepatocellular carcinoma (HCC) cases recur each year, even after surgical resection. Currently, there is a lack of knowledge about the causes of recurrence and the effective prevention. Prediction of HCC recurrence requires diagnostic markers endowed with high sensitivity and specificity. This study aims to identify new key proteins for HCC recurrence and to build machine learning algorithms for predicting HCC recurrence.

Methods
The proteomics data for analysis in this study were obtained from the Clinical Proteomics Tumor Analysis Consortium (CPTAC) database. We analyzed different proteins based on cases with or without recurrence of HCC. Survival analysis, Cox regression analysis, and area under the ROC curves (AUROC > 0.7) were used to screen for more significant differential proteins. Predictive models for HCC recurrence were developed using four machine learning algorithms.

Results
A total of 690 differentially expressed proteins between 50 relapsed and 77 non-relapsed hepatitis B-related HCC patients were identified. Seven of these proteins had an AUROC > 0.7 for 5-year survival in HCC, including BAHCC1, ESF1, RAP1GAP, RUFY1, SCAMP3, STK3, and TMEM230. Among the machine learning algorithms, the random forest algorithm showed the highest AUROC values (AUROC: 0.991, 95% CI 0.962–0.999) for identifying HCC recurrence, followed by the support vector machine (AUROC: 0.893, 95% Cl 0.824–0.956), the logistic regression (AUROC: 0.774, 95% Cl 0.672–0.868), and the multi-layer perceptron algorithm (AUROC: 0.571, 95% Cl 0.459–0.682).

Conclusions
Our study identifies seven novel proteins for predicting HCC recurrence and the random forest algorithm as the most suitable predictive model for HCC recurrence.


0815-9319
2145-2153
Feng, Gong
9b7f0cf9-ba87-4aeb-b8f7-606637e77a97
He, Na
013c90fa-1602-4b97-af4f-61dae389ad44
Xia, Harry Hua-Xiang
66cfa45d-15df-4f25-bcb8-7fa95e84673f
Mi, Man
2c0e765f-57ca-49f6-887d-ad2131acc88a
Wang, Ke
cbe56096-7d83-4719-b218-dd88f480387b
Byrne, Christopher
1370b997-cead-4229-83a7-53301ed2a43c
Targher, Giovanni
d9eeb89e-ba2f-46ab-bf76-88946806951e
Yuan, Hai-Yang
cad5c4d6-494e-4e37-be79-9f65c067e89c
Zhang, Xin-Lei
6e23de1c-eca8-4a8a-924c-a22fe7201441
Zheng, Ming-Hua
17d88ba3-f153-473c-9cae-50d3ddc6c0e9
Ye, Feng
4e4bba72-e531-4131-9974-adfb3d090170
Feng, Gong
9b7f0cf9-ba87-4aeb-b8f7-606637e77a97
He, Na
013c90fa-1602-4b97-af4f-61dae389ad44
Xia, Harry Hua-Xiang
66cfa45d-15df-4f25-bcb8-7fa95e84673f
Mi, Man
2c0e765f-57ca-49f6-887d-ad2131acc88a
Wang, Ke
cbe56096-7d83-4719-b218-dd88f480387b
Byrne, Christopher
1370b997-cead-4229-83a7-53301ed2a43c
Targher, Giovanni
d9eeb89e-ba2f-46ab-bf76-88946806951e
Yuan, Hai-Yang
cad5c4d6-494e-4e37-be79-9f65c067e89c
Zhang, Xin-Lei
6e23de1c-eca8-4a8a-924c-a22fe7201441
Zheng, Ming-Hua
17d88ba3-f153-473c-9cae-50d3ddc6c0e9
Ye, Feng
4e4bba72-e531-4131-9974-adfb3d090170

Feng, Gong, He, Na, Xia, Harry Hua-Xiang, Mi, Man, Wang, Ke, Byrne, Christopher, Targher, Giovanni, Yuan, Hai-Yang, Zhang, Xin-Lei, Zheng, Ming-Hua and Ye, Feng (2022) Machine learning algorithms based on proteomic data mining accurately predicting the recurrence of hepatitis B-related hepatocellular carcinoma. Journal of Gastroenterology and Hepatology, 37 (11), 2145-2153. (doi:10.1111/jgh.15940).

Record type: Article

Abstract

Background and Aim
Over 10% of hepatocellular carcinoma (HCC) cases recur each year, even after surgical resection. Currently, there is a lack of knowledge about the causes of recurrence and the effective prevention. Prediction of HCC recurrence requires diagnostic markers endowed with high sensitivity and specificity. This study aims to identify new key proteins for HCC recurrence and to build machine learning algorithms for predicting HCC recurrence.

Methods
The proteomics data for analysis in this study were obtained from the Clinical Proteomics Tumor Analysis Consortium (CPTAC) database. We analyzed different proteins based on cases with or without recurrence of HCC. Survival analysis, Cox regression analysis, and area under the ROC curves (AUROC > 0.7) were used to screen for more significant differential proteins. Predictive models for HCC recurrence were developed using four machine learning algorithms.

Results
A total of 690 differentially expressed proteins between 50 relapsed and 77 non-relapsed hepatitis B-related HCC patients were identified. Seven of these proteins had an AUROC > 0.7 for 5-year survival in HCC, including BAHCC1, ESF1, RAP1GAP, RUFY1, SCAMP3, STK3, and TMEM230. Among the machine learning algorithms, the random forest algorithm showed the highest AUROC values (AUROC: 0.991, 95% CI 0.962–0.999) for identifying HCC recurrence, followed by the support vector machine (AUROC: 0.893, 95% Cl 0.824–0.956), the logistic regression (AUROC: 0.774, 95% Cl 0.672–0.868), and the multi-layer perceptron algorithm (AUROC: 0.571, 95% Cl 0.459–0.682).

Conclusions
Our study identifies seven novel proteins for predicting HCC recurrence and the random forest algorithm as the most suitable predictive model for HCC recurrence.


Text
R1-MLA&HCC_marked - Accepted Manuscript
Restricted to Repository staff only until 4 July 2023.
Request a copy
Text
Supplementary Table_Clean - Accepted Manuscript
Restricted to Repository staff only until 4 July 2023.
Request a copy
Image
Figure 1 - Accepted Manuscript
Restricted to Repository staff only until 4 July 2023.
Request a copy
Image
Figure 2 - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Image
Figure 3 - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Image
Figure 4 - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Image
Figure 5 - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Image
Figure 6 - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
J of Gastro and Hepatol - 2022 - Feng - Machine learning algorithms based on proteomic data mining accurately predicting (1) - Version of Record
Restricted to Repository staff only
Request a copy

Show all 9 downloads.

More information

Accepted/In Press date: 4 July 2022
e-pub ahead of print date: 11 July 2022
Published date: 1 November 2022
Additional Information: Publisher Copyright: © 2022 Journal of Gastroenterology and Hepatology Foundation and John Wiley & Sons Australia, Ltd.

Identifiers

Local EPrints ID: 468323
URI: http://eprints.soton.ac.uk/id/eprint/468323
ISSN: 0815-9319
PURE UUID: 6a6b02dc-5966-46a3-b3ed-ca898a037102
ORCID for Christopher Byrne: ORCID iD orcid.org/0000-0001-6322-7753

Catalogue record

Date deposited: 10 Aug 2022 17:32
Last modified: 25 Nov 2022 02:36

Export record

Altmetrics

Contributors

Author: Gong Feng
Author: Na He
Author: Harry Hua-Xiang Xia
Author: Man Mi
Author: Ke Wang
Author: Giovanni Targher
Author: Hai-Yang Yuan
Author: Xin-Lei Zhang
Author: Ming-Hua Zheng
Author: Feng Ye

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×