Active learning for discovery in the laboratory: characterising biomolecular systems.
ALRA: Active Learning in Real-world Applications, Bristol, GB,
28 Sep 2012.
Resource costs are a limiting factor in many real-world problems. For example, a chemist or biologist may only be able to afford a limited number of chemicals that can be used to perform experiments that examine a hypothesis. A domain expert may only have a limited amount time they can spend analysing a dataset. An inaccessible remote sensor or robotic exploratory system may only have a limited communications bandwidth through which data can be transmitted. In each case there is a strong requirement to make the best use of the resources available, to obtain as much information as possible. As such, active learning appears well suited to address the balance required between resource usage and information gain. However, real-world discovery problems often have factors or constraints upon them that may not currently be well addressed by active learning.
As an example of applying active learning to real-world problems, we consider laboratory based experimental response characterisation of biomolecular systems. In the process of scientific discovery, such characterisation is often the initial investigation performed to build a loose understanding of the behaviours that exist within a particular experiment parameter space. The aim is normally to navigate a large and potentially high dimensional parameter space as efficiently as possible, to determine if it contains any interesting behaviours that may warrant further in-depth investigation, or whether a new search should start elsewhere. As a consequence, the resources available to such a problem are often very small, particularly in relation to the size and scale of the parameter spaces explored. Additionally, biological experimentation is error prone, meaning that there is no guarantee an observation obtained in an experiment is representative of the true underlying behaviour. With deviations from the true value worse than standard experimental noise, such erroneous observations provide a significant problem for active learning, as their combination with extremely limited resources and in turn few previous observations, means there becomes a large amount of uncertainty within the problem. If an observation does not fit with the current belief of the behaviour of the system being investigated, the question has to be asked as to whether the observation is erroneous, or whether the current belief is incorrect. Whilst repeat experiments can be performed, performing too many repeat experiments will reduce already limited resources, leading to exploration being restricted and potentially features of the behaviour being missed. In machine learning terms, the problem is that of learning from an extremely small amount of data that may contain errors where the learner controls the data to obtain.
To address these problems, we took insight from how successful scientists go about facing these issues whilst making discoveries in the laboratory. Ideas within the philosophy of science in particular provide different ways of viewing and managing the problems, when compared to the more mathematically focussed views of active learning literature. This led to the development of initial generalised methods for addressing these problems, to produce a set of algorithms that combine ideas from philosophy of science and active learning, which are capable of effectively characterising biomolecular systems within the laboratory.
Actions (login required)