READ ME File For 'Large-scale and multi-omics data analysis for supporting precision medicine of human disease' Dataset DOI: 10.5258/SOTON/D2586 ReadMe Author: Yilu Zhou, University of Southampton, ORCID ID:0000-0002-4090-099X This dataset supports the thesis entitled:"Large-scale data analysis and integration to advance precision prognosis, therapy stratification and understanding of human disease" AWARDED BY: Univeristy of Southampton DATE OF AWARD: 2023 DESCRIPTION OF THE DATA All the data is generated from Microsoft software and programming language R, no specialist software needed to view the data. All codes for reproducing results are provided in the links. This dataset contains: Chapter 1 Figure: Figure 1.1: Time Timeline of precision medicine. Figure 1.2. Popularity of the ‘personalized medicine’ and ‘precision medicine’ in the field of science and politics. Figure 1.3. Comprehensive measurement of the UK Biobank. Figure 1.4. Current main barriers in precision medicine. Figure 1.5. Diagnostic pipeline for idiopathic pulmonary fibrosis (IPF). Figure 1.6. A schematic representation of mechanisms contributing to the development of Idiopathic pulmonary fibrosis. Figure 1.7. Regulation of HIF-1 by hypoxia. Figure 1.8. Sympathoadrenal development and neuroblastoma tumorigenesis. Figure 1.9. The International Neuroblastoma Staging System Committee system. Chapter 1 Table: Table 1.1. Projects of precision medicine in the world. Table 1.2. IPF diagnosis according to HRCT and histopathology. Table 1.3. Summary of genetic risk factors in IPF. Chapter 2 Figure: Figure 2.1. Workflow of two integration methods: Figure 2.2. A schematic illustration of multi-layer data. Figure 2.3. A schematic illustration of network projection. Figure 2.4. Differences between KB model and Non-Negative matrix. Figure 2.5. ssGSEA methods pipeline. Chapter 2 Table: Table 2.1. Systematically comparison of meta-analysis methods. Table 2.2. Example of three matrices G, C and M. Chapter 3 Figure: Figure 3.1 Principal Component Analysis (PCA) of normalized proteome data from each sample. Figure 3.2 (a) Volcano plot comparing significantly (limma package; p value <0.05) changed proteins in ATIIER:KRASV12 cells treated with control and 250 nM 4-OHT (RAS-activated) treated for 24 hrs. Figure 3.3 (a) Volcano plot comparing significantly (limma package; p value <0.05) changed proteins between control and 5ng/ml TGF-β treatment for 24 hrs in ATIIER:KRASV12 cells. Figure 3.4. RAS-activation in ATII cells induces Hallmark EMT proteomic signature. Figure 3.5. TGF-β in ATII cells does not induce a classical EMT program at a given time point. Figure 3.6. RAS-activation together with TGF-β are capable of inducing EMT and RAS signalling appears to drive EMT. Figure 3.7. The mRNA levels of TGFBR1, TGFBR2, and TGFBR3 in ATIIER:KRASV12, normalised to ACTB (β-actin). Figure 3.8. Unlike RAS-activation, TGF-β is insufficient to induce EMT-like changes at 24 h in ATIIER:KRASV12 cells. Figure 3.9. (a) Protein levels of ZO1 in ATIIER:KRASV12 with indicated treatment from proteomics data. Chapter 3 Table: Table 3.1. Expression data for ten proteins after limma processing between control and 4-OHT treatment (RAS-activated) in ATII ER:KRAS V12 cells at 24 hrs, after being run through limma package (v. 3_34.2) in RStudio (v. 1.1.456). Table 3.2. Expression data for ten proteins after limma processing between control and TGF-β treatment in ATII ER:KRAS V12 cells at 24 hrs, after being run through limma package in RStudio. Table 3.3. Expression data for ten proteins after limma processing between control and TGF-β with 4-OHT (RAS-activated) treatment in ATII ER:KRAS V12 cells at 24 hrs, after being run through limma package in RStudio. Table 3.4. Hallmark pathways identified using Metascape (Metascape.com) for differentially expressed proteins between control and RAS-activated ATIIER:KRASV12 cells. Table 3.5. Hallmark EMT proteins significantly differentially expressed between control and 4-OHT treatment (RAS-activated) in ATII ER:KRAS V12 cells at 24 hrs. Table 3.6. Hallmark pathways identified using Metascape (Metascape.com) for differentially expressed proteins between control and TGF-β-treated ATIIER:KRASV12 cells. Table 3.7. Hallmark EMT proteins significantly differentially expressed between control and TGF-β treatment in ATII ER:KRAS V12 cells at 24 hrs. Table S3.1: Expression data for all proteins after limma processing between control and 4-OHT treatment (RAS-activated) in ATII ER:KRAS V12 cells at 24 hrs, after being run through limma package (v. 3_34.2) in RStudio (v. 1.1.456). Table S3.2: Hallmark EMT proteins significantly differentially expressed between control and 4-OHT treatment (RAS-activated) in ATII ER:KRAS V12 cells at 24 hrs. Table S3.3: Expression data for all proteins after limma processing between control and TGF-β treatment in ATII ER:KRAS V12 cells at 24 hrs, after being run through limma package in RStudio. Table S3.4: Hallmark EMT proteins significantly differentially expressed between control and TGF-β treatment in ATII ER:KRAS V12 cells at 24 hrs. Table S3.5: Expression data for all proteins after limma processing between control and TGF-β with 4-OHT (RAS-activated) treatment in ATII ER:KRAS V12 cells at 24 hrs, after being run through limma package in RStudio. Table S3.6: Hallmark EMT proteins significantly differentially expressed between control and TGF-β treatment with 4-OHT treatment (RAS-activated) in ATII ER:KRAS V12 cells at 24 hrs. Chapter 4 Figure: Figure 4.1. A validated HIF score is increased across tissue compartments in patients with IPF. Figure 4.2. The scatter plot for the correlation between HIF score and oxidative stress score. Figure 4.3. Hierarchical clustering, survival analysis and cell fraction of BAL sample from patients with IPF. Figure 4.4. The HIF score in BAL predicts mortality in patients with IPF. Figure 4.5. The HIF score is increased in PBMC from IPF patients. Figure 4.6. Hierarchical clustering and cell fraction of PBMC sample from patients with IPF. Figure 4.7. The HIF score in PBMC from IPF patients predicts mortality. Figure 4.8. Survival analysis of PBMC sample from IPF patients. Chapter 5 Figure: Figure 5.1. Characterisation of molecular subtypes in MYCN non-amplified neuroblastomas. Figure 5.2. Consensus clustering analysis in MYCN non-amplified neuroblastomas. Figure 5.3. Clinical characterisation of subtypes within MYCN non-amplified neuroblastomas identifies key distinguishing features. Figure 5.4. Clinical characterisation of subtypes within train and test datasets identifies key distinguishing features. Figure 5.5. Defining molecular features of 3 subtypes in MYCN non-amplified neuroblastomas. Figure 5.6. WGCNA analysis of MYCN non-amplified neuroblastomas. Figure 5.7. Subgroup 2 shows a "MYCN" signature, potentially induced by Aurora Kinase A overexpression. Figure 5.8. Multivariate analysis of AURKA expression level and risk status in in MYCN non-amplified neuroblastomas. Figure 5.9. N-MYC expression correlates with Aurora kinase A status in MYCN non-amplified neuroblastomas, and is indicative of patient survival. Figure 5.10. Subgroup 3 is accompanied by an "inflamed" gene signature. Figure 5.11. Subgroup 3 signature from train and test datasets. Figure 5.12. Integrative analysis of subgroup molecular features on drug response. Figure 5.13. Identification of independent predictors to subgroup patients within MYCN non-amplified neuroblastomas and evaluation of different patient stratification strategies. Figure 5.14. Identification and evaluation of independent predictors to subgroup patients within MYCN non-amplified neuroblastomas. Chapter 5 Table: Table 5.1. List of datasets collected for meta-analysis. Table 5.2. List of top 50% variable genes (ten genes) for consensus clustering. Table 5.3. Univariate and multivariate logistic regression analysis in MYCN non-amplified neuroblastomas (n = 1,120). Table 5.4. DEGs (ten differentially expressed genes) in subgroups. Table 5.5. GSEA (gene set enrichment analysis) in subgroups. Table 5.6. Enriched pathways in WGCNA (weighted gene co-expression network analysis)’s module in subgroups. Table 5.7. List of ten genes in PPI (protein–protein interaction) network analysis. Table 5.8. List of five immune-related gene sets. Table 5.9. Analysis of clinically actionable genes and drug response. Table 5.10. Independent predictors to subgroup patients within MYCN non-amplified neuroblastomas. Table S5.1: List of datasets collected for meta-analysis. Table S5.2: List of top 50% variable genes for consensus clustering. Table S5.3: Univariate and multivariate regression analysis in MYCN non-amplified neuroblastomas (n = 1,120). Table S5.4: DEGs (differentially expressed genes) in subgroups. Table S5.5: GSEA (gene set enrichment analysis) in subgroups. Table S5.6: WGCNA (weighted gene co-expression network analysis) in subgroups. Table S5.7: List of genes in PPI (protein–protein interaction) network analysis. Table S5.8: List of 46 immune-related gene sets. Table S5.9: Analysis of clinically actionable genes and drug response. Chapter 6 Figure: Figure 6.1. Three factors affecting analysis results of precision medicine. Date of data collection: 2018-2023 Information about geographic location of data collection: Univeristy of Southampton, UK Licence: CC-BY Date that the file was created: April, 2023