READ ME File For 'Dataset for: An in silico reverse vaccinology study of Brachyspira pilosicoli, the causative organism of intestinal spirochaetosis, to identify putative vaccine candidates Dataset DOI: 10.5258/SOTON/D1834 Date that the file was created: July, 2022 ------------------- GENERAL INFORMATION ------------------- ReadMe Author: Myron Chrsitodoulides, University of Southampton Date of data collection: 2016-2021 Information about geographic location of data collection: University of Southampton and University of Surrey, UK Related projects: Study funded by Houghton Trust, UK -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse:CC-BY This dataset supports the publication: AUTHORS: TITLE:An in silico reverse vaccinology study of Brachyspira pilosicoli, the causative organism of intestinal spirochaetosis, to identify putative vaccine candidates JOURNAL:Process Biochemistry PAPER DOI IF KNOWN:https://doi.org/10.1016/j.procbio.2022.08.014 -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains raw data from in silico analyses for reverse vaccinology Supplemental Material Supplementary Figure 1. Clustal alignment and dendrogram of the OppA family proteins identified by RV and ORFs 1-6 oligopeptide-binding proteins from study of Movahedi and Hampson, 2010. Supplementary Dataset 1. Brachyspira pilosicoli B2904, complete genome, nucleotide sequence, downloaded from Genbank, NCBI. Supplementary Dataset 2. RAST annotation server output for 2679 protein coding sequences for B. pilosicoli B2904, complete genome. Supplementary Dataset 3. Amino acid sequences for all 2679 proteins in B. pilosicoli B2904, complete genome (FASTA file). Supplementary Dataset 4. Output from PSORTB3, providing initial localisation and localisation score values for each protein. Supplementary Dataset 5. Output from CELLO, providing additional localisation and localisation score values for each protein. Supplementary Dataset 6. Output from SOSUIGramN, providing additional localisation and localisation score values for each protein. Supplementary Dataset 7. Output from LipoP server, providing information on lipopeptides. Supplementary Dataset 8. Output from SignalP-5.0 server, providing information on the presence of signal peptides. Supplementary Dataset 9. Output from TMHMM.2 server, providing details of transmembrane regions for each protein. Supplementary Dataset 10. Output from eggNOG-mapper, providing additional functional data and preferred names of B. pilosicoli B2904 proteins. Supplementary Dataset 11. Comparison of proteomes of B. pilosicoli B2904 with non-pathogen E. coli K12 and commensal L. reuteri using SEED viewer. Supplementary Dataset 12. Examination of cysteine residues in B. pilosicoli B2904 proteins. Proteins were scored for 0, 1, 2,3, 4 and 5 or more residues. Supplementary Dataset 13. Final list of B. pilosicoli B2904 reverse vaccinology list. The sheets show the SeqID, Feature ID, Length (bp), Putative Name and Function, PSORTb3 Score, Protein characteristic/function, renamed lipoproteins, putative PSORTb3 localisation, CELLO, SOSUIGramN, the Consensus Localization, the number of Amino Acids, protein Molecular Weight (kDa), TMHMM number of transmembrane helices, LipoP server prediction, SignalP, C-term aromatic residue, and commonality with proteins from E. coli K12 and Lactobacillus reuteri 275, Vaxigen and Vaxijn programs. The final list is the list before analysis of individual proteins in the literature and additional exclusion (see text for details). Supplementary Dataset 14. Allergenic proteins and allergenic regions in a protein were predicted in silico using the AlgPred 2.0 web server, with a cut-off value of 0.5. Supplementary Dataset 15. Presence of RV proteins (B2904) in other B. pilosicoli strains. Sequences producing significant alignments were used to confirm presence or absence of the proteins. Supplementary Dataset 16. Comparison of genome annotations of B. pilosicoli B2904 by RAST and PGAP. The table shows the presence and absence of genes annotating proteins in strain B2904, whose genome was annotated by RAST and PGAP (NCBI) programs. Supplementary Dataset 17. B-cell epitope predictions. Complete data are shown for the prediction of linear B-cell epitopes within 18 B. pilosicoli RV protein candidates. Supplementary Dataset 18. Final Blast-p analysis of the chimera antigen. The data output shows that of the 29 sequences identified with similarity, 22 principally belonged to B. pilosicoli, three to B. hampsonii and one to B. intermedia, and there was very low and irrelevant similarity with a canarypox virus protein (27%), a hypothetical protein from the brownbanded bambooshark Chiloscyllium punctatum (24%) and a trichohyalin-like protein from the American lobster Homarus americanus. Supplementary Dataset 19. Full sequence alignments for the OppA family proteins, and the TmpB and BatC proteins identified by RV. Alignments were generated by Clustal for the 9 proteins in the RV list identified as OppA, and the two each identified as TmpB and BatC. . and : denote amino acid differences. * denotes amino acid similarity. The linear B-cell epitope sequences are highlighted in yellow. Supplementary Dataset 20. Raw ABCPred data for proteins from the OppA family. The linear B-cell epitopes predicted for each protein are highlighted in green. Supplementary Dataset 21. Emini and ABCPred data for the TmpB and BatC proteins. Linear B-cell epitopes are predicted for each protein with raw data shown. --------------------------