Ahmed, Waseem, Vincent Veluthandath, Aneesh and Senthil Murugan, Ganapathy (2025) Data fusion for improved prediction interval performance of ratiometric binary liposome measurement. PhotonIcs & Electromagnetics Research Symposium, ADNEC, Abu Dhabi, United Arab Emirates. 04 - 08 May 2025. 13 pp .
Abstract
Reducing uncertainty in measurements is important in clinical contexts so that effective treatment decisions can be made quickly. Neonatal respiratory distress syndrome affects pre-term infants with deficient lung surfactant where an effective diagnosis can be made by establishing that the ratio of lecithin:sphingomyelin, two lipid biomarkers, is less than 2.2. There are currently no tests that can establish this biomarker ratio in a clinically effective time period that fulfil the requirements of a point of care paradigm. Vibrational spectroscopy, in the form of Raman and infrared (IR) spectroscopies, coupled with machine learning are able to quantify these biomarkers and offer a potential solution to this measurement problem. Since these spectral modalities are based on different selection rules the information present in the spectra from each are complementary but distinct. Spectral measurements were made of aqueous liposomes formed from varying ratios of lecithin and sphingomyelin and split 80:20 into training and test datasets. Using this we show that by using data fusion between Raman and IR spectra it is possible to leverage this complementarity to improve partial least squares regression (PLSR) model performance over models from a single spectral modality (IR R2: 0.902 and Raman R2: 0.951) to an R2 of 0.973 for a low-level fusion model and to also reduce the mean uncertainty resulting in a more accurate and precise measurement. These findings highlight the potential of fused Raman-IR data to overcome current biomarker measurement challenges offering a pathway to facilitate rapid and accurate biomarker assessments at the point of care.
Point of care detection and quantification of disease biomarkers allows clinicians to make rapid, evidence based, diagnoses. Neonatal respiratory distress syndrome (nRDS) is a condition affecting pre-term neonates due to deficiencies in lung surfactant resulting in high lung surface tension. Currently no point of care diagnosis test is available to diagnose nRDS but a ratio of less than 2.2 of two lipid biomarkers in lung surfactant, lecithin and sphingomyelin (L/S ratio) is diagnostic [1]. A clinically useful measurement would be to measure the L/S ratio of vesicles present in samples of lung surfactant. Utilizing a vibrational spectroscopy-based solution augmented by machine learning to interpret the sample spectra will meet the clinical requirements for rapid, bedside diagnosis. Infrared and Raman spectroscopies are vibrational spectroscopic techniques that measure molecular vibrations of biomarkers without requiring additional labelling and uniquely identify and quantify them. Although the spectra they provide are sometimes similar, they are based on distinct selection rules, a change in molecular dipole moment and a change in polarizability respectively, so the spectra are unique for each. The particular strengths and weaknesses of each can be overcome by considering both sets of data together. When combined with multivariate machine learning approaches to interpret the spectra of unknown samples with suitably trained regression models it is possible to obtain better performance than either of the two modalities alone. Such models can also be configured to generate prediction intervals (PI) for regressed outputs. In this study we show that by taking a data-fusion approach we can improve measurement performance and reduce the uncertainty in regressed outputs of measurements of synthetic vesicles.
We prepared aqueous suspensions of liposomes with differing L/S ratios as per our previously published method [2]. The Raman spectra were collected at a laser excitation wavelength of 532 nm with the microscope in an epi-fluorescence configuration on a Renishaw inVia micro-Raman spectrometer. Spectra were recorded at 10 seconds exposure with 5 spectra averaging with the laser power at an average of 5 mW. Fourier transform infrared (FTIR) measurements were performed on an Agilent Cary 670 FTIR instrument, with a potassium bromide (KBr) beam splitter, and a deuterated triglycine sulphate (DTGS) detector on a 10-bounce zinc selenide horizontal attenuated total reflectance accessory. Spectra were recorded at 4 cm-1 resolution with 32 co-added scans against a deionized water background. The spectra were indexed to match between the Raman and FTIR spectra and then split 80% training and 20% evaluation test sets. The training sets were further split into cross validation sets to establish optimal model parameters and finally the whole training set used to train the models. These were used to train partial least squared regression models that would allow quantification of the L/S ratio of a sample from their Raman and IR spectra with a specified uncertainty (95% PI). It was ensured that similar spectra were present in each of the training and test sets, so that an effective comparison between the modelling approaches could be made.
The results show that the test set evaluation of the Raman model (R2: 0.951) performed better than the IR model (R2: 0.902). We also show that by using both techniques together, by taking a low (R2: 0.973) or high (R2: 0.967) data fusion approach, it is possible to improve the predictive performance of the models based on comparable datasets.
The PI performances were evaluated by evaluating the mean upper and lower PIs where a smaller PI corresponds to less uncertainty in the measurement. Table 1 shows that both the FTIR and Raman models had larger PIs and therefore uncertainty in their model outputs, while the fusion models were both observed to have smaller mean upper and lower PIs. This finding supports the use of a fusion approach as a method to reduce uncertainty in this measurement. Of the two fusion models, the low-level fusion model was observed to outperform the high-level fusion model, supporting an approach to further reduce uncertainty in this measurement. This reduction is an important point to consider for the development of point of care devices for measuring biomarker concentrations in clinical settings where less uncertainty in measurements will reduce the time required for patients to access treatment.
This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC Grant EP/S03109X/1).
References
1. W. Ahmed, A. V. Veluthandath, D. J. Rowe, J. Madsen, H. W. Clark, A. D. Postle, J. S. Wilkinson, and G. S. Murugan, "Prediction of Neonatal Respiratory Distress Biomarker Concentration by Application of Machine Learning to Mid-Infrared Spectra," Sensors 22, 1744 (2022).
2. A. V. Veluthandath, W. Ahmed, J. Madsen, H. W. Clark, A. D. Postle, J. S. Wilkinson, and G. S. Murugan, "Quantification of lung surfactant lipid (dipalmitoylphosphatidylcholine/sphingomyelin) ratio in binary liposomes using Raman spectroscopy," J. Raman Spectrosc. 1–8 (2023).
More information
Identifiers
Catalogue record
Export record
Contributors
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
