READ ME File for 'Data for "The Future of Bone Regeneration: Integrating AI into Tissue Engineering".' Dataset DOI: 10.5258/SOTON/D1695 ReadMe Author: BENITA S. MACKAY, University of Southampton, orcid.org/0000-0003-2050-8912 This document supports the publication: The Future of Bone Regeneration: Integrating AI into Tissue Engineering Authors: Benita S. Mackay, Karen Marshall, James A. Grant-Jacob, Janos Kanczler, Robert W. Eason, Richard O. C. Oreffo and Ben Mills Published: [TBC] [CC information when known] Paper DOI: Funding: Ben Mills is funded by EPSRC (EP/N03368X/1) and EPSRC (EP/T026197/1). Richard O.C. Oreffo is funded by the BBSRC (BB/P017711/1) and the UK Regenerative Medicine Platform (MR/R01565/1). These research councils are gratefully acknowledged. ## List of files and descriptions ## Fig0.png: Concept image for "The Future of Bone Regeneration: Integrating AI into Tissue Engineering". Fig1.png: Bone repair using tissue engineering and biomaterial paradigm. Cells are isolated from a tissue biopsy and then cultured. Biomaterials are used with multiple properties for tissue generation and growth, including biochemical and biophysical cell-instructive properties, to aid in both proliferation and required differentiation. Generated tissue is implanted into the trauma site to aid regeneration. Fig2.png: Acquiring stem cells for tissue engineering through donor site sampling, stem cell isolation and expansion. A sample of bone marrow is extracted from the patient. The desired cells are isolated using select stem cell markers, and then cultured into stem cell colonies. These colonies can be expanded to increase the total amount of stem cells available for seeding onto scaffolds, differentiation into desired cell lineages, or transplanting directly into the patient to aid in tissue regeneration. Fig3.png: Ex vivo and in-situ tissue engineering both use the body’s own regenerative ability, boosted by tissue-engineered materials. For ex-vivo, cells are extracted and then cultured on biomaterial scaffolds and in bioreactor environments. The modified tissue is then implanted into the body. For in situ, there is no extraction necessary. Fig4.png: Cell-instructive biomaterials with a variety of characteristics at different scales of interest. At a macroscale, different types and production range from hydrogels to solid scaffolds and regimented 3D printed to more randomised microporous structures, all of which need to be degradable and capable of withstanding large forces. Features relevant at the microscale can include the alignment of cells, both at cellular and multi-cellular level. At the nanoscale, both chemical (a-d) and physical (e-h) characteristics overlap to promote a large range of responses, from proliferation (a-d) (green are live cells and red are dead or dying cells, with noticeable difference in levels of dead cells between (b) and (d)) to differentiation (e-h) (green is expression of osteopontin or osteocalcein, seen in (e-f) but not (g-h), while red and blue show the cell body and nucleus respectively) of HBMSCs. (e-h) reproduced from ref [3] under Creative Commons Attribution License open access (© 2010 Laura E. McNamara et al.). Fig5.png: The number of parameters rises exponentially as the number of features is increased. With 4 characteristics, in this case shape, material, size and surface type, the parameter count increases from 1 to 108 with only 13 features. Assuming one parameter takes an hour to investigate, 2 characteristics could take a whole day to investigate while 4 characteristics two weeks. There are hundreds of different biomaterial characteristics, with thousands of different features, leading to millions of experimental research hours. Fig6.png: AI is capable of accurate predictions of unseen scenarios with a limited training dataset, as long as data is sufficiently varied. For an AI to predict whether an unseen animal is a dog or not, a dataset needs to include a variety of dogs and animals which are not dogs, without becoming biased or limiting to particular breeds of dog or species of alternate animals. A smaller quantity dataset (blue) is a higher quality than the larger quality dataset (yellow), even though there is less data for the AI to work with. (Both datasets must be expanded by several magnitudes to train a neural network adequately.) Fig7.png: Biological neurons which make up the visual cortex and artificial nodes are conceptually analogous. Dendrites carry impulses towards the cell body, where the nucleus (the brain of the cell) can process this information. A new impulse is generated and carried away from the cell body through the axon to the axon terminals, where the signal is released and becomes an input signal to another neuron. Weighted inputs from several nodes are summed within a new node, which then processes the information through an activation function. This new output is then weighted and input to another node in a latter layer. Both process multiple inputs and transmit as a new output for further processing. Fig8.png: A concept image of a deep neural network. The input layer, consisting of several different nodes (circles), is connected to the first hidden layer, where the data from the input layer is weighted, summed and transformed without human interaction before progression into the next layer. The second hidden layer is only input data from the first hidden layer, allowing for more abstract feature extraction. Data processed from this hidden layer is then propagated to the output layer. The number of hidden layers, the number of nodes and the type of data transformation between each layer is completely customisable between different networks. Fig9.png: Feature extraction is simple for ConvNets, as the relatively large data in an image is easily reduced (convolved) from large pixel arrangements to smaller pixel distributions without losing spatial data, which is often necessary (such as determining whether there is a dog ear, which cannot be done through pixel value alone). Generating data is harder, as it is a one-to-many problem. While one DNN can determine which features are essential, placement and detail require a realistic rather than simply statistically averaged pixel arrangement. Fig10.png: AI is behind many advances in clinical-related fields, including discoveries in microbiology, health interfacing through smart wearable technology, rapid medical imaging analysis enhancement, and novel potential drug discoveries. Fig11.png: Data augmentation is the process of increasing the training data through techniques such as randomised cropping, rotation and alterations to brightness and contrast. Each technique can increase the size of the dataset by orders of magnitude, so a single raw image can be augmented to produce thousands of unique images. Variation is important for training DNNs, so augmenting too heavily can lead to reduced performance. However, light augmentation can increase variety and therefore final performance. Fig12.png: Machine intelligence can identify cell colony quality with up to 83.8% accuracy for “good” quality colonies. While accuracy for “semi-good” and “bad” requires further improvement before the system is capable of real-world application, there is possibility for real time non-invasive stem cell quality control through use of AI. Fig13.png: A flow chart illustrating how collecting data for training AI can lead to improved clinical practice. It was improved through the use of new risk-evaluation decision tree models, which were designed from AI-highlighted biomarkers – the biomarkers that influenced AI predicted patient outcome. This approach can be applied to multiple medical conditions. Fig14.png: When a simple AI system was given the task of not reaching “game over”, or losing, in the game Tetris, it learned the most efficient way of accomplishing this – it paused the game. While unexpected solutions can be a benefit of applying AI, it also shows the importance of instructing detailed and relevant task functions. Fig15.png: Images used to train a network subsequently published in Nature, which was discovered to be unusable in clinical application due to training bias. Randomly sampling 160 malignant and 160 benign melanoma images for 4 of each from the ISIC Archive, 3 of the 4 malignant images had rulers while only 1 of the benign images had a ruler visible. While not statistically valid for determining %-bias across the dataset used for the network, it shows the bias towards visible rulers within malignant melanoma images within the ISIC database [120]. Fig16.png: Step-by-step example of applying AI to improve a time-consuming and resource-expensive task. Step 1 is determining a task which balances improvement with ease of data-acquisition, such as reducing time needed for cell culture through AI-prediction of Stro-1 selected HBMSCs after an additional 24 hours. Step 2 is selecting appropriate data (a brightfield image at time-points before and after 24 hours), collecting the data (a continual time-lapse experiment at multiple positions with multiple cell cultures), and preparation (augmentation to provide maximum relevant data per input image and maximum images without overfitting). Images were processed here by combining 3 time-points and then performing randomised cropping. Step 3 is to decide on AI architecture(s), depending on both the task and the data acquired. For image data, a cGAN is the starting-off architecture used here, as it has proven successful for similar tasks and code is readily available and open source. Step 4 is to train and then test the architecture, making multiple changes to both hyperparameters and the architecture until results can be trusted. The final step is to apply the AI, as the inclusion of new input and output data will reveal current limitations and provide improvement to the AI, while simultaneously saving 24 hours of cell culture time. Fig17.png: Comprehensive testing includes both positive and negative testing to determine both reliability and applicability. For cGANs, a successful positive test would include large areas of similarity (green) between the network-generated prediction (red) and experimentally obtained result (blue) to a detailed input. A negative test (blank input) should result in different results due to random network output, or the network has overfit to training data and is not applicable in real-world usage. Fig18.png: The Future of AI: Putting the Clinician at the Centre. In a continuous cycle, clinicians generate data used to train a system of connected DNNs. By splitting tasks into smaller sub-tasks, it creates multiple manual check points to scrutinise the DNN process, allowing for greater error correction and potential for generating further scientific understanding. As the clinician is heavily involved in training and checking the DNNs, the clinician knows the limitations and trusts the accuracy of the DNN system. Consequently, application is smoother and clinical practice is improved through DNN improvements and breakthroughs. Better data follows, leading to better DNN training, leading to even better clinical practice, in a positive feedback mechanism. Fig19.png: A collaboration of clinicians, researchers and deep learning offers exciting possibilities for medical innovation, including in the field of tissue engineering, where there are many unknowns and parameter problems which hinder current manual experimentation. Data of data collection: 05.01.2021 Licence: Dat that the file was created: January, 2021