The University of Southampton
University of Southampton Institutional Repository

Active learning for discovery in the laboratory: characterising biomolecular systems

Active learning for discovery in the laboratory: characterising biomolecular systems
Active learning for discovery in the laboratory: characterising biomolecular systems
Resource costs are a limiting factor in many real-world problems. For example, a chemist or biologist may only be able to afford a limited number of chemicals that can be used to perform experiments that examine a hypothesis. A domain expert may only have a limited amount time they can spend analysing a dataset. An inaccessible remote sensor or robotic exploratory system may only have a limited communications bandwidth through which data can be transmitted. In each case there is a strong requirement to make the best use of the resources available, to obtain as much information as possible. As such, active learning appears well suited to address the balance required between resource usage and information gain. However, real-world discovery problems often have factors or constraints upon them that may not currently be well addressed by active learning.

As an example of applying active learning to real-world problems, we consider laboratory based experimental response characterisation of biomolecular systems. In the process of scientific discovery, such characterisation is often the initial investigation performed to build a loose understanding of the behaviours that exist within a particular experiment parameter space. The aim is normally to navigate a large and potentially high dimensional parameter space as efficiently as possible, to determine if it contains any interesting behaviours that may warrant further in-depth investigation, or whether a new search should start elsewhere. As a consequence, the resources available to such a problem are often very small, particularly in relation to the size and scale of the parameter spaces explored. Additionally, biological experimentation is error prone, meaning that there is no guarantee an observation obtained in an experiment is representative of the true underlying behaviour. With deviations from the true value worse than standard experimental noise, such erroneous observations provide a significant problem for active learning, as their combination with extremely limited resources and in turn few previous observations, means there becomes a large amount of uncertainty within the problem. If an observation does not fit with the current belief of the behaviour of the system being investigated, the question has to be asked as to whether the observation is erroneous, or whether the current belief is incorrect. Whilst repeat experiments can be performed, performing too many repeat experiments will reduce already limited resources, leading to exploration being restricted and potentially features of the behaviour being missed. In machine learning terms, the problem is that of learning from an extremely small amount of data that may contain errors where the learner controls the data to obtain.

To address these problems, we took insight from how successful scientists go about facing these issues whilst making discoveries in the laboratory. Ideas within the philosophy of science in particular provide different ways of viewing and managing the problems, when compared to the more mathematically focussed views of active learning literature. This led to the development of initial generalised methods for addressing these problems, to produce a set of algorithms that combine ideas from philosophy of science and active learning, which are capable of effectively characterising biomolecular systems within the laboratory.
Lovell, Chris
1ac8eed7-512f-4082-a7ab-75b5e4950518
Lovell, Chris
1ac8eed7-512f-4082-a7ab-75b5e4950518

Lovell, Chris (2012) Active learning for discovery in the laboratory: characterising biomolecular systems. ALRA: Active Learning in Real-world Applications, Bristol, United Kingdom. 28 Sep 2012.

Record type: Conference or Workshop Item (Other)

Abstract

Resource costs are a limiting factor in many real-world problems. For example, a chemist or biologist may only be able to afford a limited number of chemicals that can be used to perform experiments that examine a hypothesis. A domain expert may only have a limited amount time they can spend analysing a dataset. An inaccessible remote sensor or robotic exploratory system may only have a limited communications bandwidth through which data can be transmitted. In each case there is a strong requirement to make the best use of the resources available, to obtain as much information as possible. As such, active learning appears well suited to address the balance required between resource usage and information gain. However, real-world discovery problems often have factors or constraints upon them that may not currently be well addressed by active learning.

As an example of applying active learning to real-world problems, we consider laboratory based experimental response characterisation of biomolecular systems. In the process of scientific discovery, such characterisation is often the initial investigation performed to build a loose understanding of the behaviours that exist within a particular experiment parameter space. The aim is normally to navigate a large and potentially high dimensional parameter space as efficiently as possible, to determine if it contains any interesting behaviours that may warrant further in-depth investigation, or whether a new search should start elsewhere. As a consequence, the resources available to such a problem are often very small, particularly in relation to the size and scale of the parameter spaces explored. Additionally, biological experimentation is error prone, meaning that there is no guarantee an observation obtained in an experiment is representative of the true underlying behaviour. With deviations from the true value worse than standard experimental noise, such erroneous observations provide a significant problem for active learning, as their combination with extremely limited resources and in turn few previous observations, means there becomes a large amount of uncertainty within the problem. If an observation does not fit with the current belief of the behaviour of the system being investigated, the question has to be asked as to whether the observation is erroneous, or whether the current belief is incorrect. Whilst repeat experiments can be performed, performing too many repeat experiments will reduce already limited resources, leading to exploration being restricted and potentially features of the behaviour being missed. In machine learning terms, the problem is that of learning from an extremely small amount of data that may contain errors where the learner controls the data to obtain.

To address these problems, we took insight from how successful scientists go about facing these issues whilst making discoveries in the laboratory. Ideas within the philosophy of science in particular provide different ways of viewing and managing the problems, when compared to the more mathematically focussed views of active learning literature. This led to the development of initial generalised methods for addressing these problems, to produce a set of algorithms that combine ideas from philosophy of science and active learning, which are capable of effectively characterising biomolecular systems within the laboratory.

Text
Active_Learning_for_Discovery_in_the_Laboratory.pdf - Other
Download (2MB)

More information

Published date: 28 September 2012
Venue - Dates: ALRA: Active Learning in Real-world Applications, Bristol, United Kingdom, 2012-09-28 - 2012-09-28
Organisations: Electronic & Software Systems

Identifiers

Local EPrints ID: 343587
URI: http://eprints.soton.ac.uk/id/eprint/343587
PURE UUID: 8c56e185-cb87-4e22-bcd6-f828b0dab890

Catalogue record

Date deposited: 05 Oct 2012 09:56
Last modified: 14 Mar 2024 12:05

Export record

Contributors

Author: Chris Lovell

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×