READ ME File For 'Benthic abyssal microbial dataset from the Atlantic and Pacific' Dataset DOI: 10.5258/SOTON/D2325 ReadMe Author: ANITA LOUISE HOLLINGSWORTH, University of Southampton 0000-0001-5669-3895 This dataset supports the thesis entitled "Spatial variability and distribution of benthic microbial diversity in the Atlantic and Pacific" AWARDED BY: University of Southampton DATE OF AWARD: 2022 DESCRIPTION OF THE DATA This is data that supports the PhD thesis, titled "Spatial variability and distribution of benthic microbial diversity in the Atlantic and Pacific". Data was collected from a number of research cruises, from April 2015 to July 2018. These cruises took place at the Porcupine Abyssal Plain (PAP), in the NE Atlantic, Station M in the NE Pacific and the Clarion-Clipperton Fracture Zone (CCFZ) in the equatorial Pacific. This dataset comprises of amplicon sequences from the microbial communities associated with abyssal sediment, water, holothurian guts, polymetallic nodule samples and holothurian host tissue. Two gene fragments were sequenced: the V4 region of the 16S rRNA gene from the microbial communities within sediment, holothurian gut content, nodule and water filter samples, and CO1 (Cytochrome c oxidase subunit I) from the holothurian host tissue. The DNA extractions were concentrated and were pooled in triplicate per sample, with three technical replicates sequenced per pool. The V4 region of the 16S rRNA gene was amplified by the polymerase chain reaction (PCR), using the oligonucleotide primers Pro515f/Pro806r for Bacteria and Archaea, and avoiding primer bias associated with targeting multiple domains (Herlemann et al., 2011). The amplified 16S rRNA gene products and extraction blanks were then prepared with the Nextera XT v2 Kit (Illumina, San Diego, CA) and sequenced on an Illumina MiSeq platform at the Environmental Genomics Sequencing Facility (University of Southampton, National Oceanography Centre, Southampton). The COI gene (partial 690 bp) was amplified by the polymerase chain reaction (PCR) using the oligonucleotide primers as described in Miller et al., (2017). PCR reactions were performed using the following mixes: 10 μL GoTaq Green Master Mix (Promega, UK), 1.0 μL of each primer, 7 μL water and 1.5 μL of template DNA. PCR amplifications were conducted in the following format: denaturation at 95 C for 3 min, followed by 30 cycles of denaturation at 95 C for 40 s, annealing at 50 C for 40 s, extension at 72 C for 50 s, followed by a final annealing at 72 C for 5 min. The PCR product size and purity of samples was checked with 1% agarose gel electrophoresis and the PCR product was purified with QIAquick 96 PCR Purification Kit (Qiagen, USA). Cleaned PCR products were sequenced using cycle sequencing technology (dideoxy chain termination/cycle sequencing) on ABI 3730XL sequencing machines (Eurofins Genomics, Germany). The demultiplexed Illumina 16S rRNA reads were analysed with the microbiome analysis package QIIME 2 (Quantitative Insights Into Microbial Ecology) version 2019.1 (Bolyen et al., 2019) and sequence quality control for the Illumina amplicon data was implemented with the DADA2 pipeline within the QIIME 2 software package (Callahan et al., 2016). Amplicon Sequence Variants (ASVs) or features were resolved using the DADA2 denoise-single method. A naive Bayes classifier that was pre-trained on the V4 region of reference sequences from the Silva database (version 132; Quast et al., 2013) was used to classify representative sequences of ASVs in our dataset and clustered at 99% identity. The generated feature table from the QIIME2 output was first normalised using the ‘core-metrics phylogenetic’method within the q2-diversity plugin of QIIME2, before being used for abundance and diversity analysis. This method randomly subsamples counts in each sample without replacement within a feature table to a user specified depth, so that each sample has an equal number of counts. The resulting normalised feature table was used to calculate and generate alpha diversity indices for each sample in QIIME2. The remainder of the community composition and statistical analyses were performed using the ‘vegan’ package in R v 3.3.2 (Oksanen et al.,2007; 2017). References: Bolyen, E. et al., (2019 Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9. Callahan, B.J. et al., (2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods. 13. (7): 581-3.https://pubmed.ncbi.nlm.nih.gov/27214047/ Herlemann, D. P. et al., (2011). Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic Sea. ISME J. 5: 1571–1579. https://www.nature.com/articles/ismej201141 Miller, A.K. et al., (2017) Molecular phylogeny of extant Holothuroidea (Echinodermata). Mol. Phylogenet. Evol. 111: 110-131.https://doi.org/10.1016/j.ympev.2017.02.014 Oksanen, J. (2007) Multivariate analyses of ecological communities in R: vegan tutorial. 39pp. Oksanen, J., et al., (2017) Vegan: Community ecology package. R Package version 2.4-2. https://github.com/vegandevs/vegan. Quast, C., et al., (2013) The SILVA ribosomal RNA gene database project:improved data processing and web-based tools. Nucleic Acids Res. 41: 590-596. https://doi.org/10.1093/nar/gks1219 This dataset contains: Each zipped file that has been uploaded contains files that represents samples, which are detailed in the metadata file. Each sample has 2 files associated with it, containing forward reads and reverse reads for the amplicon data. The samples listed in the metadata file correspond to the samples that were taken for each chapter of the thesis (Chapters 2-4). Due to their large size, each chapter is split into 2 parts, "a" and "b". Holothurian CO1 sequences are in a separate zipped file. Note that data is in the fastq. format and the amplicon data are unprocessed raw reads, to enable flexibility in analysis using bioinformatic software. Date of data collection: April 2015 - July 2018 Information about geographic location of data collection: Abyssal samples taken from the Porcupine Abyssal Plain (PAP), in the NE Atlantic, Station M in the NE Pacific and the Clarion-Clipperton Fracture Zone (CCFZ) in the equatorial Pacific Licence: Creative Commons Attribution CC-BY Related projects/Funders: This work was supported by a studentship from the UK Natural Environmental Research Council (Grant NE/ L002531/1). Related publication: Hollingsworth, A.L., Jones, D.O.B., Young, C.R. (2021) Spatial variability of abyssal nitrifying microbes in the North-Eastern Clarion-Clipperton Zone. Front. Mar. Sci. 8: 663420: https://doi.org/10.3389/fmars.2021.663420 Date that the file was created: August, 2022