Group 1: Challenge: Task 1 - predict solubility given a large set of calculated features
Group 1: Challenge: Task 1 - predict solubility given a large set of calculated features
Solubility is one of the most important factors to consider during drug discovery. The ability of three linear regression models to predict solubility was assessed using the data set provided. The data set provided The models assessed used L2 and L1 regularization, and the third model was the Partial Least Squares (PLS) regression model. PLS regression was found to be the best model out of the three models assessed. The results from using PLS regression on the test data set were further analysed and outliers were identified. The outliers were characterised, and it was found that the PLS regression model has issues predicting the solubility of highly aromatic compounds.
AI3SD, Machine Learning
University of Southampton
Swain, Jonathan
59c6096b-6673-4ec6-a874-2997636c2d18
Patrick, Bradley
2df277f6-147f-4921-bbfa-2d37e44a204f
Friso, Andrea
5a07c7a7-0b06-4441-b358-28c196bd2046
Criveanu, Dan
1bb17949-9ad5-42e2-87ed-efdca81f1972
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420
28 June 2022
Swain, Jonathan
59c6096b-6673-4ec6-a874-2997636c2d18
Patrick, Bradley
2df277f6-147f-4921-bbfa-2d37e44a204f
Friso, Andrea
5a07c7a7-0b06-4441-b358-28c196bd2046
Criveanu, Dan
1bb17949-9ad5-42e2-87ed-efdca81f1972
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420
Swain, Jonathan, Patrick, Bradley, Friso, Andrea and Criveanu, Dan
,
Frey, Jeremy G., Niranjan, Mahesan and Kanza, Samantha
(eds.)
(2022)
Group 1: Challenge: Task 1 - predict solubility given a large set of calculated features
(AI4SD-Machine-Learning-Summer-School, 1)
University of Southampton
11pp.
(doi:10.5258/SOTON/AI3SD0244).
Record type:
Monograph
(Project Report)
Abstract
Solubility is one of the most important factors to consider during drug discovery. The ability of three linear regression models to predict solubility was assessed using the data set provided. The data set provided The models assessed used L2 and L1 regularization, and the third model was the Partial Least Squares (PLS) regression model. PLS regression was found to be the best model out of the three models assessed. The results from using PLS regression on the test data set were further analysed and outliers were identified. The outliers were characterised, and it was found that the PLS regression model has issues predicting the solubility of highly aromatic compounds.
Text
MLSummerSchoolReport_Group1
- Version of Record
More information
Published date: 28 June 2022
Venue - Dates:
AI4SD Machine Learning Summer School, University of Southampton, Southampton, United Kingdom, 2022-06-20 - 2022-06-24
Keywords:
AI3SD, Machine Learning
Identifiers
Local EPrints ID: 470704
URI: http://eprints.soton.ac.uk/id/eprint/470704
PURE UUID: 1e0eb20a-9362-4551-9977-b98add41cfb4
Catalogue record
Date deposited: 18 Oct 2022 16:43
Last modified: 17 Mar 2024 03:52
Export record
Altmetrics
Contributors
Author:
Jonathan Swain
Author:
Bradley Patrick
Author:
Andrea Friso
Author:
Dan Criveanu
Editor:
Mahesan Niranjan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics