The University of Southampton
University of Southampton Institutional Repository

Group 1: Challenge: Task 1 - predict solubility given a large set of calculated features

Group 1: Challenge: Task 1 - predict solubility given a large set of calculated features
Group 1: Challenge: Task 1 - predict solubility given a large set of calculated features
Solubility is one of the most important factors to consider during drug discovery. The ability of three linear regression models to predict solubility was assessed using the data set provided. The data set provided The models assessed used L2 and L1 regularization, and the third model was the Partial Least Squares (PLS) regression model. PLS regression was found to be the best model out of the three models assessed. The results from using PLS regression on the test data set were further analysed and outliers were identified. The outliers were characterised, and it was found that the PLS regression model has issues predicting the solubility of highly aromatic compounds.
AI3SD, Machine Learning
1
University of Southampton
Swain, Jonathan
59c6096b-6673-4ec6-a874-2997636c2d18
Patrick, Bradley
2df277f6-147f-4921-bbfa-2d37e44a204f
Friso, Andrea
5a07c7a7-0b06-4441-b358-28c196bd2046
Criveanu, Dan
1bb17949-9ad5-42e2-87ed-efdca81f1972
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420
Swain, Jonathan
59c6096b-6673-4ec6-a874-2997636c2d18
Patrick, Bradley
2df277f6-147f-4921-bbfa-2d37e44a204f
Friso, Andrea
5a07c7a7-0b06-4441-b358-28c196bd2046
Criveanu, Dan
1bb17949-9ad5-42e2-87ed-efdca81f1972
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420

Swain, Jonathan, Patrick, Bradley, Friso, Andrea and Criveanu, Dan , Frey, Jeremy G., Niranjan, Mahesan and Kanza, Samantha (eds.) (2022) Group 1: Challenge: Task 1 - predict solubility given a large set of calculated features (AI4SD-Machine-Learning-Summer-School, 1) University of Southampton 11pp. (doi:10.5258/SOTON/AI3SD0244).

Record type: Monograph (Project Report)

Abstract

Solubility is one of the most important factors to consider during drug discovery. The ability of three linear regression models to predict solubility was assessed using the data set provided. The data set provided The models assessed used L2 and L1 regularization, and the third model was the Partial Least Squares (PLS) regression model. PLS regression was found to be the best model out of the three models assessed. The results from using PLS regression on the test data set were further analysed and outliers were identified. The outliers were characterised, and it was found that the PLS regression model has issues predicting the solubility of highly aromatic compounds.

Text
MLSummerSchoolReport_Group1 - Version of Record
Available under License Creative Commons Attribution.
Download (4MB)

More information

Published date: 28 June 2022
Venue - Dates: AI4SD Machine Learning Summer School, University of Southampton, Southampton, United Kingdom, 2022-06-20 - 2022-06-24
Keywords: AI3SD, Machine Learning

Identifiers

Local EPrints ID: 470704
URI: http://eprints.soton.ac.uk/id/eprint/470704
PURE UUID: 1e0eb20a-9362-4551-9977-b98add41cfb4
ORCID for Jeremy G. Frey: ORCID iD orcid.org/0000-0003-0842-4302
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X
ORCID for Samantha Kanza: ORCID iD orcid.org/0000-0002-4831-9489

Catalogue record

Date deposited: 18 Oct 2022 16:43
Last modified: 17 Mar 2024 03:52

Export record

Altmetrics

Contributors

Author: Jonathan Swain
Author: Bradley Patrick
Author: Andrea Friso
Author: Dan Criveanu
Editor: Jeremy G. Frey ORCID iD
Editor: Mahesan Niranjan ORCID iD
Editor: Samantha Kanza ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×