A computational pipeline for generating SAXS-consistent atomistic protein ensembles - dataset Author: Cameron Brown Supervisors: Prof Jonathan Essex, Dr Robert Rambo Dataset DOI: 10.5258/SOTON/D3902 ReadMe Author: CAMERON BROWN, University of Southampton http://orcid.org/0000-0001-8991-245X This dataset supports the thesis entitled AWARDED BY: University of Southampton DATE OF AWARD: 2026 Date of data collection: September 2021 - September 2025 Information about geographic location of data collection: Southampton Licence: CC-BY Related projects/Funders: Diamond Light Source, EPSRC --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Description: This repo contains sample input and output files, analysis scripts, and relevent code associated with the thesis 'A computational pipeline for generating SAXS-consistent protein ensembles'. Note: MD trajectories are not included due to file size. Instead, pdbs used to seed simulations, as well as selected output (e.g. tpr/gro/crysol_summary.txt) are provided. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Directories: Chapter3_Pipeline - AutoMD-SAXS: This is the finished computational pipeline used to generate and analyse the majority of the simulation data in this thesis. Git: https://github.com/camlbrown/AutoMD-SAXS - GROMACS-calculated_MD_Metrics - CLoNe clustering - Carbonara: Version of Carbonara used to generate all Carbonara-related data in the thesis, apart from the '20-seed' models - these were generated with the latest version of Carbonara, which can be found here: https://github.com/Prior-Lab-Durham-University/carbonara Chapter4_Rift_valley - MD: AMBER and CHARMM simulation data with analysis scripts - SAXS data - Dimer: AF3, pydocksaxs, pydocksaxs MD data, Carbonara data Chapter5_IgG2 - Carbonara models and their modeller variations - Analysis scripts - Original crystal structures - Experimental SAXS data - MD data for each modeller variant Chapter6_IgG3 - Analysis scripts - Models - SAXS data - MD data: glycosims, saxs-guided, metadynamics, hinge simulations, amber, charmm --------------------------------------------------------------------------------------------------------------------------------------------- References: Carbonara: Original: https://pubmed.ncbi.nlm.nih.gov/32023061/ Latest pre-print: https://www.researchgate.net/publication/391422815_Carbonara_A_Rapid_Method_for_SAXS-Based_Refinement_of_Protein_Structures github: https://github.com/Prior-Lab-Durham-University/carbonara If you use Carbonara, please cite: @article{carbonara2025, title={Carbonara: A Rapid Method for SAXS-Based Refinement of Protein Structures}, author={McKeown, J. and Bale, A. and Brown, C. and Fisher, H. and Rambo, R. and Essex, J. and Degiacomi, M. and Prior, C.}, journal={ResearchSquare}, year={2025}, doi={10.21203/rs.3.rs-6447099/v1}, url={https://doi.org/10.21203/rs.3.rs-6447099/v1} } This dataset contains modified code from external sources, detailed below: - Bayesian/Maximum Entropy (BME): BME, used in Chapters 5 and 6, was run using modified scripts. The originals are provided here: https://github.com/KULL-Centre/BME Latex bib reference: @incollection{bottaro2020integrating, title={Integrating molecular simulation and experimental data: a Bayesian/maximum entropy reweighting approach}, author={Bottaro, Sandro and Bengtsen, Tone and Lindorff-Larsen, Kresten}, booktitle={Structural Bioinformatics}, pages={219--240}, year={2020}, publisher={Springer} } - CLoNe clustering: CLoNe clustering, used in the AutoMD-SAXS pipeline in Chapter 3, was run using modified scripts. Original code found here: https://github.com/LBM-EPFL/CLoNe @article{10.1093/bioinformatics/btaa742, author = {Träger, Sylvain and Tamò, Giorgio and Aydin, Deniz and Fonti, Giulia and Audagnotto, Martina and Dal Peraro, Matteo}, title = {CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles}, journal = {Bioinformatics}, volume = {37}, number = {7}, pages = {921-928}, year = {2020}, month = {08}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btaa742}, url = {https://doi.org/10.1093/bioinformatics/btaa742}, eprint = {https://academic.oup.com/bioinformatics/article-pdf/37/7/921/50341061/btaa742.pdf}, } - Ramachandran Plotter: Ramachandran plots used in Chapters 5 and 6 were generated using modified scripts from an external source. Original code found here: https://github.com/Joseph-Ellaway/Ramachandran_Plotter ----------------------------------------------------------------------------------------------------- THE LICENSE FOR THIS DATASET DOES NOT COVER CARBONARA, BME, CLONE CLUSTERING, OR RAMACHANDRAN PLOTTER -----------------------------------------------------------------------------------------------------