# README: Generated vIoT DDoS Dataset — Raw Data

## Overview

This dataset was generated as part of a PhD research project investigating DDoS attack detection in virtualised Internet of Things (vIoT) environments. It contains synthetic network flow records simulating both benign traffic and a range of DDoS attack scenarios across a controlled vIoT testbed built on Software-Defined Networking (SDN) and Network Function Virtualisation (NFV) architectures.

The dataset supports the development and evaluation of AI-based intrusion detection systems, with a particular focus on Fully Connected Neural Network (FCNN) models for binary classification of network traffic as benign or malicious.

---

## Dataset Summary

| Property             | Value                              |
|----------------------|------------------------------------|
| Total Flow Records   | 5,326                              |
| Total Features       | 61                                 |
| Attack Types         | 9                                  |
| Scenarios            | 20 (S1–S20)                        |
| Device Types         | 2 (Physical RPi, Virtual IoT)      |
| Collection Period    | October 2024                       |
| Expected Accuracy    | 99.92%                             |
| File Format          | Microsoft Excel (.xlsx)            |
| Sheets               | `Dataset_Summary`, `Generated_vIoT_Raw_Data` |

---

## File Structure

```
PhD_Generated_vIoT_RAW_DATA5k.xlsx
├── Dataset_Summary          — High-level metadata about the dataset
└── Generated_vIoT_Raw_Data  — Main dataset with 5,326 flow records × 61 features
```

---

## Feature Descriptions

### Identifiers & Timestamps
| Feature             | Description                                      |
|---------------------|--------------------------------------------------|
| `Flow_ID`           | Unique identifier for each network flow          |
| `Timestamp_Unix`    | Flow capture time as Unix timestamp              |
| `Timestamp_DateTime`| Human-readable timestamp (ISO format)            |

### Network Addressing
| Feature              | Description                        |
|----------------------|------------------------------------|
| `Source_IP`          | Source IP address                  |
| `Source_Port`        | Source port number                 |
| `Destination_IP`     | Destination IP address             |
| `Destination_Port`   | Destination port number            |
| `Protocol`           | Protocol name (TCP, UDP, ICMP, DNS, HTTP) |
| `Protocol_Number`    | Numeric protocol identifier        |
| `Protocol_Flags`     | Protocol-specific flag values      |

### Traffic Volume & Rate
| Feature                | Description                                     |
|------------------------|-------------------------------------------------|
| `Total_Packets`        | Total packets in the flow                       |
| `Total_Bytes`          | Total bytes transferred                         |
| `Flow_Duration_Seconds`| Duration of the flow in seconds                 |
| `Packets_Per_Second`   | Packet rate                                     |
| `Bytes_Per_Second`     | Byte rate                                       |

### Inter-Arrival Time (IAT)
| Feature       | Description                            |
|---------------|----------------------------------------|
| `IAT_Mean_ms` | Mean inter-arrival time (milliseconds) |
| `IAT_Std_ms`  | Standard deviation of IAT              |
| `IAT_Min_ms`  | Minimum IAT                            |
| `IAT_Max_ms`  | Maximum IAT                            |

### Directional Features
| Feature                    | Description                              |
|----------------------------|------------------------------------------|
| `Forward_Packets`          | Packets sent from source to destination  |
| `Backward_Packets`         | Packets sent from destination to source  |
| `Forward_Bytes`            | Bytes in forward direction               |
| `Backward_Bytes`           | Bytes in backward direction              |
| `Fwd_Bwd_Ratio`            | Ratio of forward to backward packets     |
| `Avg_Forward_Segment_Size` | Average forward segment size             |
| `Avg_Backward_Segment_Size`| Average backward segment size            |
| `Down_Up_Ratio`            | Download to upload ratio                 |

### Packet Length Statistics
| Feature               | Description                        |
|-----------------------|------------------------------------|
| `Packet_Length_Mean`  | Mean packet length                 |
| `Packet_Length_Std`   | Standard deviation of packet length|
| `Packet_Length_Min`   | Minimum packet length              |
| `Packet_Length_Max`   | Maximum packet length              |
| `Average_Packet_Size` | Overall average packet size        |

### TCP Flags
| Feature          | Description                    |
|------------------|--------------------------------|
| `FIN_Flag_Count` | Number of FIN flags observed   |
| `SYN_Flag_Count` | Number of SYN flags observed   |
| `RST_Flag_Count` | Number of RST flags observed   |
| `PSH_Flag_Count` | Number of PSH flags observed   |
| `ACK_Flag_Count` | Number of ACK flags observed   |
| `URG_Flag_Count` | Number of URG flags observed   |

### Active/Idle Time
| Feature      | Description                            |
|--------------|----------------------------------------|
| `Active_Mean`| Mean active duration                   |
| `Active_Std` | Std deviation of active duration       |
| `Active_Max` | Maximum active duration                |
| `Active_Min` | Minimum active duration                |
| `Idle_Mean`  | Mean idle duration                     |
| `Idle_Std`   | Std deviation of idle duration         |
| `Idle_Max`   | Maximum idle duration                  |
| `Idle_Min`   | Minimum idle duration                  |

### Network Configuration
| Feature               | Description                                |
|-----------------------|--------------------------------------------|
| `TTL`                 | Time-to-Live value of packets              |
| `Window_Size`         | TCP window size                            |
| `Port_Diversity_Score`| Score reflecting diversity of ports used   |

### Device Information
| Feature       | Description                                          |
|---------------|------------------------------------------------------|
| `Device_Type` | Type of IoT device (`Physical_RPi`, `Virtual_IoT`)   |
| `Device_ID`   | Unique device identifier                             |
| `MAC_Address` | Device MAC address                                   |

### Attack Labels & Scenarios
| Feature                  | Description                                                                 |
|--------------------------|-----------------------------------------------------------------------------|
| `Attack_Type`            | Specific attack type (see below)                                            |
| `Attack_Category`        | Binary category: `Normal` or `DDoS_Attack`                                 |
| `Attack_Intensity`       | Severity level: `Low`, `Medium`, `High`, `Very_High` (NaN for benign)      |
| `Scenario_ID`            | Numeric scenario identifier (S1–S20)                                        |
| `Scenario_Name`          | Descriptive scenario name                                                   |

### Ground Truth & Model Output
| Feature                  | Description                                                    |
|--------------------------|----------------------------------------------------------------|
| `Ground_Truth_Label`     | True label: `0` = Benign, `1` = DDoS Attack                   |
| `Model_Prediction`       | FCNN model prediction: `0` = Benign, `1` = DDoS Attack        |
| `Prediction_Confidence`  | Model confidence score (0.0 – 1.0)                            |
| `Correct_Classification` | Whether prediction matched ground truth: `YES` / `NO`         |

---

## Attack Types

| Attack Type   | Description                                                  |
|---------------|--------------------------------------------------------------|
| `BENIGN`      | Normal, non-malicious network traffic                        |
| `SYN_FLOOD`   | TCP SYN flood attack exhausting connection state             |
| `UDP_FLOOD`   | High-volume UDP packet flood                                 |
| `ICMP_FLOOD`  | ICMP (ping) flood overwhelming target                        |
| `DNS_AMP`     | DNS amplification attack using open resolvers                |
| `HTTP_FLOOD`  | Application-layer flood targeting HTTP services              |
| `ACK_FLOOD`   | TCP ACK flood bypassing SYN-based filters                    |
| `RST_FLOOD`   | TCP RST flood disrupting established connections             |
| `SLOWLORIS`   | Slow HTTP attack holding connections open to exhaust resources|

---

## Experimental Scenarios (S1–S20)

| Scenario ID | Scenario Name          | Description                                              |
|-------------|------------------------|----------------------------------------------------------|
| S1          | Pure_Benign            | Baseline normal traffic only                             |
| S2          | Single_SYN             | Single-type SYN flood attack                             |
| S3          | Sequential_Multi       | Multiple attack types launched sequentially              |
| S4          | Simultaneous_Multi     | Multiple attack types launched simultaneously            |
| S5          | Very_Low_Intensity     | Low-intensity attack traffic                             |
| S6          | Gradual_Ramp           | Attack intensity gradually increasing over time          |
| S7          | Pulsing_Attack         | Periodic bursts of attack traffic                        |
| S8          | Sustained_High         | Sustained high-intensity attack                          |
| S9          | Legitimate_Surge       | Legitimate traffic surge (no attack)                     |
| S10         | Mixed_High_Load        | Mix of legitimate and attack traffic under high load     |
| S11         | Bandwidth_Saturation   | Attack targeting full bandwidth exhaustion               |
| S12         | Partial_Outage         | Partial service disruption scenario                      |
| S13         | CPU_Constraint         | Attack under CPU resource constraints                    |
| S14         | Memory_Limit           | Attack under memory resource limitations                 |
| S15         | VM_Interference        | Attack with virtual machine interference                 |
| S16         | Combined_Stress        | Combined resource stress with attack                     |
| S17         | Cold_Start             | Attack targeting system during cold start                |
| S18         | Intermittent_Burst     | Intermittent bursts of attack traffic                    |
| S19         | Zero_Day               | Zero-day-style novel/unseen attack pattern               |
| S20         | Adversarial_Evasion    | Attack crafted to evade detection systems                |

---

## Class Distribution

| Class          | Label | Count | Percentage |
|----------------|-------|-------|------------|
| Benign         | 0     | 3,416 | 64.1%      |
| DDoS Attack    | 1     | 1,910 | 35.9%      |
| **Total**      |       | **5,326** | **100%** |

---

## Protocols

The dataset covers the following network protocols:

- **TCP** — Transmission Control Protocol
- **UDP** — User Datagram Protocol
- **ICMP** — Internet Control Message Protocol
- **DNS** — Domain Name System
- **HTTP** — HyperText Transfer Protocol

---

## Intended Use

This dataset is intended for:

- Training and evaluating machine learning models for DDoS intrusion detection
- Benchmarking AI-based detection frameworks in vIoT environments
- Research into network security in SDN/NFV-based IoT infrastructures
- Reproducibility validation of the FCNN-based detection framework

---

## Related Publication & Code

- **DOI:** [https://doi.org/10.5258/SOTON/D3805](https://doi.org/10.5258/SOTON/D3805)
- **GitHub Repository:** [https://github.com/b-asad/FCNN-FrameWork](https://github.com/b-asad/FCNN-FrameWork)
- **Institution:** School of Electronics and Computer Science, University of Southampton

---

## Citation

If you use this dataset in your research, please cite the associated PhD thesis:

```
[Author Name]. (2024). AI-Based DDoS Detection in Virtualised IoT Environments.
PhD Thesis, University of Southampton.
DOI: https://doi.org/10.5258/SOTON/D3805
```

---

## Contact

For questions regarding this dataset, please contact the author via the GitHub repository listed above or through the University of Southampton.

---

## Licence

This dataset was generated for academic research purposes. Please refer to the associated thesis and repository for licensing terms.