# README: Generated vIoT DDoS Dataset — Raw Data ## Overview This dataset was generated as part of a PhD research project investigating DDoS attack detection in virtualised Internet of Things (vIoT) environments. It contains synthetic network flow records simulating both benign traffic and a range of DDoS attack scenarios across a controlled vIoT testbed built on Software-Defined Networking (SDN) and Network Function Virtualisation (NFV) architectures. The dataset supports the development and evaluation of AI-based intrusion detection systems, with a particular focus on Fully Connected Neural Network (FCNN) models for binary classification of network traffic as benign or malicious. --- ## Dataset Summary | Property | Value | |----------------------|------------------------------------| | Total Flow Records | 5,326 | | Total Features | 61 | | Attack Types | 9 | | Scenarios | 20 (S1–S20) | | Device Types | 2 (Physical RPi, Virtual IoT) | | Collection Period | October 2024 | | Expected Accuracy | 99.92% | | File Format | Microsoft Excel (.xlsx) | | Sheets | `Dataset_Summary`, `Generated_vIoT_Raw_Data` | --- ## File Structure ``` PhD_Generated_vIoT_RAW_DATA5k.xlsx ├── Dataset_Summary — High-level metadata about the dataset └── Generated_vIoT_Raw_Data — Main dataset with 5,326 flow records × 61 features ``` --- ## Feature Descriptions ### Identifiers & Timestamps | Feature | Description | |---------------------|--------------------------------------------------| | `Flow_ID` | Unique identifier for each network flow | | `Timestamp_Unix` | Flow capture time as Unix timestamp | | `Timestamp_DateTime`| Human-readable timestamp (ISO format) | ### Network Addressing | Feature | Description | |----------------------|------------------------------------| | `Source_IP` | Source IP address | | `Source_Port` | Source port number | | `Destination_IP` | Destination IP address | | `Destination_Port` | Destination port number | | `Protocol` | Protocol name (TCP, UDP, ICMP, DNS, HTTP) | | `Protocol_Number` | Numeric protocol identifier | | `Protocol_Flags` | Protocol-specific flag values | ### Traffic Volume & Rate | Feature | Description | |------------------------|-------------------------------------------------| | `Total_Packets` | Total packets in the flow | | `Total_Bytes` | Total bytes transferred | | `Flow_Duration_Seconds`| Duration of the flow in seconds | | `Packets_Per_Second` | Packet rate | | `Bytes_Per_Second` | Byte rate | ### Inter-Arrival Time (IAT) | Feature | Description | |---------------|----------------------------------------| | `IAT_Mean_ms` | Mean inter-arrival time (milliseconds) | | `IAT_Std_ms` | Standard deviation of IAT | | `IAT_Min_ms` | Minimum IAT | | `IAT_Max_ms` | Maximum IAT | ### Directional Features | Feature | Description | |----------------------------|------------------------------------------| | `Forward_Packets` | Packets sent from source to destination | | `Backward_Packets` | Packets sent from destination to source | | `Forward_Bytes` | Bytes in forward direction | | `Backward_Bytes` | Bytes in backward direction | | `Fwd_Bwd_Ratio` | Ratio of forward to backward packets | | `Avg_Forward_Segment_Size` | Average forward segment size | | `Avg_Backward_Segment_Size`| Average backward segment size | | `Down_Up_Ratio` | Download to upload ratio | ### Packet Length Statistics | Feature | Description | |-----------------------|------------------------------------| | `Packet_Length_Mean` | Mean packet length | | `Packet_Length_Std` | Standard deviation of packet length| | `Packet_Length_Min` | Minimum packet length | | `Packet_Length_Max` | Maximum packet length | | `Average_Packet_Size` | Overall average packet size | ### TCP Flags | Feature | Description | |------------------|--------------------------------| | `FIN_Flag_Count` | Number of FIN flags observed | | `SYN_Flag_Count` | Number of SYN flags observed | | `RST_Flag_Count` | Number of RST flags observed | | `PSH_Flag_Count` | Number of PSH flags observed | | `ACK_Flag_Count` | Number of ACK flags observed | | `URG_Flag_Count` | Number of URG flags observed | ### Active/Idle Time | Feature | Description | |--------------|----------------------------------------| | `Active_Mean`| Mean active duration | | `Active_Std` | Std deviation of active duration | | `Active_Max` | Maximum active duration | | `Active_Min` | Minimum active duration | | `Idle_Mean` | Mean idle duration | | `Idle_Std` | Std deviation of idle duration | | `Idle_Max` | Maximum idle duration | | `Idle_Min` | Minimum idle duration | ### Network Configuration | Feature | Description | |-----------------------|--------------------------------------------| | `TTL` | Time-to-Live value of packets | | `Window_Size` | TCP window size | | `Port_Diversity_Score`| Score reflecting diversity of ports used | ### Device Information | Feature | Description | |---------------|------------------------------------------------------| | `Device_Type` | Type of IoT device (`Physical_RPi`, `Virtual_IoT`) | | `Device_ID` | Unique device identifier | | `MAC_Address` | Device MAC address | ### Attack Labels & Scenarios | Feature | Description | |--------------------------|-----------------------------------------------------------------------------| | `Attack_Type` | Specific attack type (see below) | | `Attack_Category` | Binary category: `Normal` or `DDoS_Attack` | | `Attack_Intensity` | Severity level: `Low`, `Medium`, `High`, `Very_High` (NaN for benign) | | `Scenario_ID` | Numeric scenario identifier (S1–S20) | | `Scenario_Name` | Descriptive scenario name | ### Ground Truth & Model Output | Feature | Description | |--------------------------|----------------------------------------------------------------| | `Ground_Truth_Label` | True label: `0` = Benign, `1` = DDoS Attack | | `Model_Prediction` | FCNN model prediction: `0` = Benign, `1` = DDoS Attack | | `Prediction_Confidence` | Model confidence score (0.0 – 1.0) | | `Correct_Classification` | Whether prediction matched ground truth: `YES` / `NO` | --- ## Attack Types | Attack Type | Description | |---------------|--------------------------------------------------------------| | `BENIGN` | Normal, non-malicious network traffic | | `SYN_FLOOD` | TCP SYN flood attack exhausting connection state | | `UDP_FLOOD` | High-volume UDP packet flood | | `ICMP_FLOOD` | ICMP (ping) flood overwhelming target | | `DNS_AMP` | DNS amplification attack using open resolvers | | `HTTP_FLOOD` | Application-layer flood targeting HTTP services | | `ACK_FLOOD` | TCP ACK flood bypassing SYN-based filters | | `RST_FLOOD` | TCP RST flood disrupting established connections | | `SLOWLORIS` | Slow HTTP attack holding connections open to exhaust resources| --- ## Experimental Scenarios (S1–S20) | Scenario ID | Scenario Name | Description | |-------------|------------------------|----------------------------------------------------------| | S1 | Pure_Benign | Baseline normal traffic only | | S2 | Single_SYN | Single-type SYN flood attack | | S3 | Sequential_Multi | Multiple attack types launched sequentially | | S4 | Simultaneous_Multi | Multiple attack types launched simultaneously | | S5 | Very_Low_Intensity | Low-intensity attack traffic | | S6 | Gradual_Ramp | Attack intensity gradually increasing over time | | S7 | Pulsing_Attack | Periodic bursts of attack traffic | | S8 | Sustained_High | Sustained high-intensity attack | | S9 | Legitimate_Surge | Legitimate traffic surge (no attack) | | S10 | Mixed_High_Load | Mix of legitimate and attack traffic under high load | | S11 | Bandwidth_Saturation | Attack targeting full bandwidth exhaustion | | S12 | Partial_Outage | Partial service disruption scenario | | S13 | CPU_Constraint | Attack under CPU resource constraints | | S14 | Memory_Limit | Attack under memory resource limitations | | S15 | VM_Interference | Attack with virtual machine interference | | S16 | Combined_Stress | Combined resource stress with attack | | S17 | Cold_Start | Attack targeting system during cold start | | S18 | Intermittent_Burst | Intermittent bursts of attack traffic | | S19 | Zero_Day | Zero-day-style novel/unseen attack pattern | | S20 | Adversarial_Evasion | Attack crafted to evade detection systems | --- ## Class Distribution | Class | Label | Count | Percentage | |----------------|-------|-------|------------| | Benign | 0 | 3,416 | 64.1% | | DDoS Attack | 1 | 1,910 | 35.9% | | **Total** | | **5,326** | **100%** | --- ## Protocols The dataset covers the following network protocols: - **TCP** — Transmission Control Protocol - **UDP** — User Datagram Protocol - **ICMP** — Internet Control Message Protocol - **DNS** — Domain Name System - **HTTP** — HyperText Transfer Protocol --- ## Intended Use This dataset is intended for: - Training and evaluating machine learning models for DDoS intrusion detection - Benchmarking AI-based detection frameworks in vIoT environments - Research into network security in SDN/NFV-based IoT infrastructures - Reproducibility validation of the FCNN-based detection framework --- ## Related Publication & Code - **DOI:** [https://doi.org/10.5258/SOTON/D3805](https://doi.org/10.5258/SOTON/D3805) - **GitHub Repository:** [https://github.com/b-asad/FCNN-FrameWork](https://github.com/b-asad/FCNN-FrameWork) - **Institution:** School of Electronics and Computer Science, University of Southampton --- ## Citation If you use this dataset in your research, please cite the associated PhD thesis: ``` [Author Name]. (2024). AI-Based DDoS Detection in Virtualised IoT Environments. PhD Thesis, University of Southampton. DOI: https://doi.org/10.5258/SOTON/D3805 ``` --- ## Contact For questions regarding this dataset, please contact the author via the GitHub repository listed above or through the University of Southampton. --- ## Licence This dataset was generated for academic research purposes. Please refer to the associated thesis and repository for licensing terms.