READ ME File For 'Chapter 2 – Influence of peatland plant functional types on prokaryotic communities across microhabitat'

Dataset DOI: 10.5258/SOTON/D3718

ReadMe Author: Najam e Sahar, University of Southampton  
ORCID ID: ORCID ID: 0000-0002-1338-730X

This dataset supports the thesis entitled:  
"Role of Plant-Microbe Interaction for C Dynamic in Northern Peat Bogs"  

AWARDED BY: University of Southampton  
DATE OF AWARD: 2025 

Date of data collection: 2019–2022  

Information about geographic location of data collection:  
Storr, Northern Sweden (peatland study site)

Licence:  
Creative Commons Attribution (CC BY 4.0)

Related projects/Funders:  
Natural Environment Research Council (NERC) Doctoral Training Partnership (NE/S007407/1)

--------------------
DATA & FILE OVERVIEW
--------------------

This dataset contains:

- chapter2.zip — compressed folder containing microbial community sequencing data from peatland soils.  
  Includes:
    - raw_16S_sequences.fastq.gz — raw bacterial 16S rRNA gene sequencing files  
    - sample_metadata.csv — metadata including microhabitat and plant functional type  
    - ASV_table.tsv — processed ASV abundance table  
    - taxonomy_table.tsv — taxonomic assignment of sequences  
    - README.txt — this file

Relationship between files, if important for context:  
Each sequence and metadata file is linked by sample IDs. ASV tables correspond to processed output derived from raw FASTQ files.

Additional related data collected that was not included in the current data package:  
Decomposition and environmental datasets are available separately in the Chapter 3 dataset.  

If data was derived from another source, list source:  
Not applicable.  

If there are multiple versions of the dataset, list the file updated, when and why update was made:  
Version 1.0 — Initial upload for DOI registration, October 2025.

--------------------------
METHODOLOGICAL INFORMATION
--------------------------

Description of methods used for collection/generation of data:  
Peat soil samples were collected from distinct microhabitats (hummocks, lawns, hollows) representing different plant functional types.  
DNA was extracted using the DNeasy PowerSoil Kit (Qiagen), and bacterial 16S rRNA gene sequencing was conducted using the Illumina MiSeq platform (2×250 bp paired-end reads).

Methods for processing the data:  
Raw reads were processed using the DADA2 for denoising, chimera removal, and ASV generation.  
Taxonomic classification was performed against the SILVA reference database (version 138).  

Software- or Instrument-specific information needed to interpret the data:  
Data can be processed and viewed in R (packages: phyloseq, vegan), or Python.  

Standards and calibration information, if appropriate:  
Standard DADA2 pipelines were followed using default parameters.  

Environmental/experimental conditions:  
Samples were collected during the growing season (June–August, 2019–2022) under natural peatland field conditions.  

Describe any quality-assurance procedures performed on the data:  
Sequencing blanks and extraction controls were used. Data were filtered for quality, read length, and chimera removal.  

People involved with sample collection, processing, analysis and/or submission:  
Najam e Sahar (sample collection, DNA extraction, sequencing coordination, data analysis, and submission).

--------------------------
DATA-SPECIFIC INFORMATION
--------------------------

Number of variables: Hundreds to thousands (depending on ASV level)  

Number of cases/rows: Approximately 20–40 samples  

Variable list, defining any abbreviations, units of measure, codes or symbols used:  
- Sample_ID: unique code per sample  
- Microhabitat: hummock, lawn, or hollow  
- PFT: plant functional type  
- ASV_ID: unique bacterial sequence feature identifiers  
- Abundance: read count per sample  
- Taxonomy: taxonomic classification  


Missing data codes:  
NA = not available  

Specialized formats or other abbreviations used:  
FASTQ (.fastq.gz), CSV, TSV formats standard for sequencing data.  

Date that the file was created: October 2025