READ ME File For 'Structure-Based Modelling of LK Peptides in Complex with AntagomiR-138 RNA' Dataset DOI: 10.5258/SOTON/D3070 Date that the file was created: May, 2024 ------------------- GENERAL INFORMATION ------------------- ReadMe Author: DIMITRIOS STAMATIS, University of Southampton, ORCID: 0000-0003-2355-3452 Date of data collection: 12/2019-12/2023 Information about geographic location of data collection: University of Southampton, U.K. Related projects: Structure-Based Modelling of LK Peptides in Complex with AntagomiR-138 RNA -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: CC BY-NC-ND Recommended citation for the data: This dataset supports the publication: AUTHORS: Dimitrios Stamatis TITLE: Structure-Based Modelling of LK Peptides in Complex with AntagomiR-138 RNA JOURNAL: University of Southampton Links to other publicly accessible locations of the data: Zenodo: https://zenodo.org/doi/10.5281/zenodo.10715632 Links/relationships to ancillary or related data sets: -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains: dataset_part1a.zip: -additional_code_FFs -RNA_simulations/README.md -RNA_simulations/Analysis -RNA_simulations/REST2_dynamics-UUCG.zip dataset_part1b.zip: -RNA_simulations/REST2_dynamics-AUCG.zip dataset_part1c.zip: -RNA_simulations/REST2_dynamics-CUUGU.zip dataset_part1d.zip: -RNA_simulations/REST2_dynamics-GUAAU-283K.zip -RNA_simulations/REST2_dynamics-GUAAU-300K.zip dataset_part2a.zip: -Peptide_simulations/README.md -Peptide_simulations/script_examples -Peptide_simulations/data/Collective_analysis-peptides -Peptide_simulations/data/stEK-capped dataset_part2b.zip: -Peptide_simulations/data/LKacr-stEK-capped dataset_part2c.zip: -Peptide_simulations/data/LKpyr-stEK-capped dataset_part2d.zip: -Peptide_simulations/data/LKH-stEK dataset_part2e.zip: -Peptide_simulations/data/LKH-stEK-capped dataset_part2f.zip: -Peptide_simulations/data/LKHW1-stEK-capped dataset_part2g.zip: -Peptide_simulations/data/LKHW16-stEK-capped dataset_part2h.zip: -Peptide_simulations/data/LKHA16-stEK-capped dataset_part3a.zip: -README.md -RNA_peptide_simulations/script_examples -RNA_peptide_simulations/data/antagomiR-138_folded dataset_part3b.zip: -RNA_peptide_simulations/data/Comparative_analysis-complex_vs_multipeptide -RNA_peptide_simulations/data/Complex_systems/Diluted_components/miRNA_stEK -RNA_peptide_simulations/data/Complex_systems/Diluted_components/miRNA_LKacr-stEK -RNA_peptide_simulations/data/Complex_systems/Diluted_components/miRNA_LKpyr-stEK dataset_part3c.zip: -RNA_peptide_simulations/data/Complex_systems/Diluted_components/miRNA_LKH-stEK dataset_part3d.zip: -RNA_peptide_simulations/data/Complex_systems/Diluted_components/miRNA_LKHW1-stEK -RNA_peptide_simulations/data/Complex_systems/Diluted_components/miRNA_LKHW16-stEK -RNA_peptide_simulations/data/Complex_systems/Diluted_components/miRNA_LKHA16-stEK -RNA_peptide_simulations/data/Complex_systems/Diluted_components/Collective_analysis-peptides Short description: -additional_code_FFs: contains Gromacs-compatible force field libraries used for the simulations, as well as the in-house developed "DiBaClD" code used the analysis of multi-molecular simulations. Further information can be found at the end of this file. -RNA_simulations: contains simulation input/output, analysis tests and results for the work presented in Chapter 3 of the Thesis. Further information on directory structure and content can be found under RNA_simulations/README.md -Peptide_simulations: contains simulation input/output, analysis tests and results for the work presented in Chapter 4 and Chapter 5 (new peptide designs) of the Thesis. Further information on directory structure and content can be found under Peptide_simulations/README.md -RNA_peptide_simulations: contains simulation input/output, analysis tests and results for the work presented in Chapter 5 of the Thesis. Further information on directory structure and content can be found under RNA_peptide_simulations/README.md Relationship between files, if important for context: Additional related data collected that was not included in the current data package: If data was derived from another source, list source: If there are there multiple versions of the dataset, list the file updated, when and why update was made: -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: -Experimental structures sourced from PDB database (https://www.rcsb.org/) -Molecular Dynamics simulations performed with Gromacs software (H. Bekker, H.J.C. Berendsen, E.J. Dijkstra, S. Achterop, R. van Drunen, D. van der Spoel, A. Sijbers, and H. Keegstra et al., Gromacs: A parallel computer for molecular dynamics simulations; pp. 252-256 in Physics computing 92. Edited by R.A. de Groot and J. Nadrchal. World Scientific, Singapore, 1993) -Ab-initio prediction of RNA structures using SimRNA software (SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction; Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JM; Nucleic Acids Res 2015 [doi: 10.1093/nar/gkv1479]) -Prediction of non-standard residue parameters with R.E.D. server (E. Vanquelef, S. Simon, G. Marquant, E. Garcia, G. Klimerak, J. C. Delepine, P. Cieplak & F.-Y. Dupradeau, R.E.D. Server: a web service for deriving RESP and ESP charges and building force field libraries for new molecules and molecular fragments, Nucl. Acids Res. 2011, 39, W511-W517) -Editing of topology files for MD simulations with ParmED (Shirts, M.R., Klein, C., Swails, J.M. et al. Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset. J Comput Aided Mol Des 31, 147-161 (2017). https://doi.org/10.1007/s10822-016-9977-1) Methods for processing the data: -Post-processing of MD trajectories and caclulation of conformational descripts with Gromacs software. -Post-processing of MD trajectories and caclulation of conformational descriptors with MDTraj library (Robert T. McGibbon, Kyle A. Beauchamp, Matthew P. Harrigan, Christoph Klein, Jason M. Swails, Carlos X. Hernández, Christia R. Schwantes, Lee-Ping Wang, Thomas J. Lane, Vijay S. Pande, MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories, Biophysical Journal, Volume 109, Issue 8, 2015, Pages 1528-1532, ISSN 0006-3495, https://doi.org/10.1016/j.bpj.2015.08.015) -Calculation of conformational descriptors for RNA molecules with Barnaba library (Barnaba: Software for Analysis of Nucleic Acids Structures and Trajectories; Sandro Bottaro, Giovanni Bussi, Giovanni Pinamonti, Sabine Reisser, Wouter Boomsma, and Kresten Lindorff-Larsen; RNA rna.067678.118; 2018, doi:10.1261/rna.067678.118) -Calculation of population-based Free Energy landscapes of conformational maps with PyEMMA software (PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models; Martin K. Scherer, Benjamin Trendelkamp-Schroer, Fabian Paul, Guillermo Pérez-Hernández, Moritz Hoffmann, Nuria Plattner, Christoph Wehmeyer, Jan-Hendrik Prinz, and Frank Noe; Journal of Chemical Theory and Computation 2015 11 (11), 5525-5542; DOI: 10.1021/acs.jctc.5b00743) -Clustering of MD-sampled conformations with MDASH library (E. Haensele, N. Mele, M. Miljak, C. M. Read, D. C. Whitley, L. Banting, C. Delépée, J. Sopkova-de Oliveira Santos, A. Lepailleur, R. Bureau, J. W. Essex and T. Clark. Conformation and Dynamics of Human Urotensin II and Urotensin Related Peptide in Aqueous Solution. J. Chem. Inf. Model. 2017, 57, 298-310. doi:10.1021/acs.jcim.6b00706) -Calculation of intermolecular interactions in multi-molecular MD simulation trajectories with GetContacts software (https://getcontacts.github.io) -Analysis of intermolecular interactions with in-house scripts. -Statistical data analysis and image generation with Numpy, Pandas, Scikit-learn, Matplotlib, and seaborn Python libraries (https://www.python.org) -Development of analysis workflows with Jupyter library (Brian Granger, Fernando Pére; Jupyter: Thinking and Storytelling with Code and Dat; Authorea; February 11, 2021; DOI: 10.22541/au.161298309.98344404/v2) Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers: Data were produced with Gromacs version 2018.1 or later, and analysed with software based on Python version 3. Environmental/experimental conditions: All simulations and analysis routines were perfomred with standardized, peer-reviewed, and publicly-available code on Southampton (Iridis5) and UK (ARCHER2) HPCs. Describe any quality-assurance procedures performed on the data: People involved with sample collection, processing, analysis and/or submission: Dimitrios Stamatis, Jonathan W. Essex. -------------------------- DATA-SPECIFIC INFORMATION -------------------------- Variable list, defining any abbreviations, units of measure, codes or symbols used: -Standard units of measure: Length in nm, Time in ps, Temperature in K, Pressure in bar, Energy in KJ Specialized formats or other abbreviations used: -Simulation force field folders: .ff -Structure file types: .pdb, .gro -Molecular topology files: .top -Simulation parameter files: .mdp -Simulation input files: .tpr -Simulation log files: .log -Simulation structure trajectory files: .trr, .xtc, .pdb, .trafl -Simulation energy trajectory files: .edr -Simulation state checkpoint files: .cpt -Simulation setup/job submission script files: .sh, .py -Analysis script files: .sh, .py -Analysis Jupyter notebook files: .ipynb -Analysis output text files: .dat, .txt, .csv -Analysis output image files: .xvg, .jpg, .jpeg Directory contents: -additional_code_FFs/gmx_FFs: contains the Gromacs-compatible force field libraries used in this study. -additional_code_FFs/DiBaClD: contains the in-house developed source code for the detection of supramolecular clusters in multi-peptide and RNA-peptide simulation trajectories. -RNA_simulations: contains simulation input/output, analysis tests and results for the work presented in Chapter 3 of the Thesis. -Peptide_simulations: contains simulation input/output, analysis tests and results for the work presented in Chapter 4 and Chapter 5 (new peptide designs) of the Thesis. -RNA_peptide_simulations: contains simulation input/output, analysis tests and results for the work presented in Chapter 5 of the Thesis.