READ ME File For 'MITRE Open CTI Contribution to Cyber Situational Awareness - Data' Dataset DOI: 10.5258/SOTON/D2912 Date that the file was created: February, 2024 ------------------- GENERAL INFORMATION ------------------- ReadMe Author: Christopher Maidens, University of Southampton [ORCID ID - 0000-0002-4385-7202] Date of data collection: Data has been derived from analysis of multiple open data sources A snapshot of MITRE ATT&CK was taken on 29/05/2022 (see also SHARING/ACCESS INFORMATION below) Various openly available cyber attack reports referenced in MITRE ATT&CK were examined (all dates and links for reports referenced are noted in ATT&CKAttackFragments_1_0d_SHRINK.docx outlined below) Information about geographic location of data collection: Not relevant for this data Related projects: No related projects -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: Recommended citation for the data: MITRE Open CTI Contribution to Cyber Situational Awareness - Data, DOI: 10.5258/SOTON/D2912, Author: Christopher Maidens This dataset supports the thesis entitled: ITRE Open CTI Contribution to Cyber Situational Awareness - PhD Thesis AWARDED BY: Univeristy of Southampton DATE OF AWARD: 2024 Links to other publicly accessible locations of the data: This the only location for this data Links/relationships to ancillary or related data sets: No related datasets -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains: [File list (filenames, directory structure (for zipped files) and brief description of all data files)] ===================== For DataFilePart1.zip ===================== << Files derived from snapshot of MITRE ATT&CK downloaded on 29/05/22 >> << Created and used by code described in code.zip described below >> << https://attack.mitre.org/resources/legal-and-branding/terms-of-use/ >> << The MITRE Corporation (MITRE) hereby grants you a non-exclusive, royalty-free license to use ATT&CK® for research, development, and commercial purposes. Any copy you make for such purposes is authorized provided that you reproduce MITRE's copyright designation and this license in any such copy. "© 2024 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation." >> -DataFileList.txt -MAFpt_ATTACK_ALIASES_Index.csv -MAFpt_ATTACK_CAR_COVERAGE.csv -MAFpt_ATTACK_CVE_REF_INDEX.csv -MAFpt_ATTACK_DATA_SOURCE_Index.csv -MAFpt_ATTACK_DATA_SOURCE_REF_Index.csv -MAFpt_ATTACK_DATA_SOURCE_TO_COLL_LAYER_Index.csv -MAFpt_ATTACK_DATA_SOURCE_TO_PLATFORM_Index.csv MAFpt_ATTACK_ENT_TACTIC_BIN_INDEX.csv -MAFpt_ATTACK_INDEX.csv -MAFpt_ATTACK_META_MODEL_INDEX.csv -MAFpt_ATTACK_MIT_REF_Index.csv -MAFpt_ATTACK_MITIGATION_INDEX.csv -MAFpt_ATTACK_REL_INDEX.csv -MAFpt_ATTACK_REL_REF_Index.csv -MAFpt_ATTACK_TACTIC_BIN_INDEX.csv -MAFpt_ATTACK_TACTIC_INDEX.csv -MAFpt_ATTACK_TCERT_GROUP.json -MAFpt_ATTACK_TCERT_TOOL.json -MAFpt_ATTACK_TECH_TO_TACTIC_INDEX.csv -MAFpt_ATTACK_TTP_BIN_INDEX.csv -MAFpt_ATTACK_TTP_INDEX.csv -MAFpt_ATTACK_TTP_REF_BIN_INDEX.csv -MAFpt_ATTACK_TTP_REF_Index.csv -MAFpt_ATTACK_TTP_TO_DATA_COMP_Index.csv -MAFpt_ATTACK_TTP_TO_DATA_SOURCE_Index.csv -MAFpt_ATTACK_TTP_TO_PERMS_REQ_Index.csv -MAFpt_ATTACK_TTP_TO_PLATFORM_Index.csv -MAFpt_ATTACK_TTP_TO_PROCEDURE_Index.csv MAFpt_ATTACKS_GRAPH_v5 - PreHMM.gexf MAFpt_ATTACKS_GRAPH_v5.gexf ===================== For DataFilePart2.zip ===================== << Files derived from snapshot of MITRE ATT&CK downloaded on 29/05/22 >> << Created and used by code described in code.zip described below >> << https://attack.mitre.org/resources/legal-and-branding/terms-of-use/ >> << The MITRE Corporation (MITRE) hereby grants you a non-exclusive, royalty-free license to use ATT&CK® for research, development, and commercial purposes. Any copy you make for such purposes is authorized provided that you reproduce MITRE's copyright designation and this license in any such copy. "© 2024 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation." >> -DataFileList.txt -MAFpt_ATTACK_REL_INDEX.csv -MAFpt_ATTACK_REL_REF_Index.csv -MAFpt_ATTACK_TTP_REF_BIN_INDEX.csv ===================== For References.zip ===================== << Data analysis of attack reports refenced in MITRE ATT&CK downloaded on 29/05/22 >> << All reports analysed are referenced and dated >> << This analysis is used to record attack sequences in the format developed in the PhD Thesis "MITRE Open CTI Contribution to Cyber Situational Awareness" - Chris Maidens << FOR COMPLETENESS AND AS ABOVE >> << https://attack.mitre.org/resources/legal-and-branding/terms-of-use/ >> << The MITRE Corporation (MITRE) hereby grants you a non-exclusive, royalty-free license to use ATT&CK® for research, development, and commercial purposes. Any copy you make for such purposes is authorized provided that you reproduce MITRE's copyright designation and this license in any such copy. "© 2024 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation." >> DataFileList.txt ATT&CKAttackFragments_1_0d_SHRINK.docx ===================== For Code.zip ===================== # This zip contains key code and data elements created and used in # the creation of PhD Thesis "MITRE Open CTI Contribution to Cyber Situational Awareness" - Christopher Maidens # For additional information you may contact Chris Maidens via the UoS email Pure system << FOR COMPLETENESS AND AS ABOVE >> << https://attack.mitre.org/resources/legal-and-branding/terms-of-use/ >> << The MITRE Corporation (MITRE) hereby grants you a non-exclusive, royalty-free license to use ATT&CK® for research, development, and commercial purposes. Any copy you make for such purposes is authorized provided that you reproduce MITRE's copyright designation and this license in any such copy. "© 2024 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation." >> # # A brief summary of the key items is provided below # ########################################################### # Used in Preparation # Code to download ATT&CK data (from MITRE TAXII service) # The data used for this work was downloaded on 29/05/22 # The code converts the MITRE data into a relational database model # implemented as .csv files and loaded into dataframes within the code. # These .csv files can be recreated by rerunning this code against the MITRE data (TAXII service) # as at above date MAFpt_ATTACK_DB_v2.py # A simple confidence test rig MAFpt_ATTACK_DB_TEST.py # Parameter reading code (yaml file) MAFpt_r_params.py # Parameter file # Key params # RUN_DOWNLOAD_ATTACK: N ---> Both set to N to use .csv files # RUN_REINDEX_ATTACK: N ---> BEST NOT USE AS ATT&CK VERSION MOVED ON - Both set to Y to download from TAXII service (see param below)- # RUN_ATTACK_TAXII_SERVER: "https://cti-taxii.mitre.org/taxii/" # RUN_ATTACK_LOCAL_FILE_ROOT: "C:/Users/...../ATTACK_DB_TEST/" ---> Where to find/write .csv files MAFpt_ATTACK_DB_TEST_Runparams.yaml ########################################################### # Used in Chapter 5 # Various tools to action cluster analysis AttackClusterReview_v2.R # Create Basic Counts (Chapter 5) MAFpt_ATTACK_DB_BASIC_COUNTS_v2.py # Hopkins test analysis MAFpt_TACTIC_CLUSTERS.py ########################################################### # Used in Chapter 6 # Data for attack sequences (provided in References.zip described abpve) ATT&CKAttackFragments_1_0d_SHRINK.docx # Analysis of reports used to define sequences # All source reports are publically available and referenced # This report can be found References.zip AttackModels # Directory with reports converted into .csv files # The IDs of the attacks match those in the Attack Fragments document above MAFpt_ATTACK_DB_ATTACK_GRAPH_BUILDER.py # Convert .csv files to .py file (used to import as base data) MAFpt_ATTACK_DB_ATTACK_GRAPHS_DATA_AUTO.py # Built by above, this is then 'included' in python scripts # wishing to process the attacks MAFpt_ATTACK_DB_ATTACK_GRAPHS_DATA_AUTO - Pre HMM.py # Just used to checkpoint content prior # to adding new attacks when exploring Markov Models in Ch 7. MAFpt_ATTACK_DB_ATTACK_GRAPHS_v6.py # Convert data into networkx graph form MAFpt_ATTACKS_GRAPH_v5.gexf # Networkx data (XML form) used in this work MAFpt_ATTACKS_GRAPH_v5 - PreHMM.gexf # Just used to checkpoint content prior # to adding new attacks when exploring Markov Models in Ch 7. ########################################################### # Used in Chapter 7 # Using the Attack Model – LCSS Fragment Matching MAFpt_ATTACK_DB_GRAPH_WALKER_v2_LCSS.py # Using the Attack Model – Hidden Markov Model # PREFER USE OF LATER DB_DICTS version MAFpt_ATTACK_DB_GRAPH_WALKER_HMM_TEST.py MAFpt_ATTACK_DB_Dicts_HMM.py # Using the Attack Model – Markov Model # PREFER USE OF LATER DB_DICTS version MAFpt_ATTACK_DB_GRAPH_WALKER_v2_MC.py MAFpt_ATTACK_DB_Dicts_MC_v2.py # Using the Attack Model – Unified Kill Chain MAFpt_ATTACK_DB_GRAPH_UKC.py # Draft utilities, used to investigate internal consistency of data for Markov analysis ThrowMeAway.py ThrowMeAway_2.py Relationship between files, if important for context: The relationships are described in the text above Broady DataFileN.zip files include data downloaded from MITRE ATT&CK (29/05/22) in relational form. This used by code (code.zip) to perform analysis of cyber attacks (these cyber attacks are described in References.zip and are based on format presented in "MITRE Open CTI Contribution to Cyber Situational Awareness" - Christopher Maidens) Additional related data collected that was not included in the current data package: Not applicable If data was derived from another source, list source: Prime data source is MITRE ATT&CK This is openly available data but please note licencing details described above © 2024 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation. https://attack.mitre.org/resources/legal-and-branding/terms-of-use/ All related attack reports are openly accessible and referenced in References.zip If there are there multiple versions of the dataset, list the file updated, when and why update was made: This is a single initial version of this dataset -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: Knowledge base data was downloaded from MITRE ATT&CK on 29/05/22 Cyber attack reports referenced in ATT&CK were used to describe sequences of techniques used bey attackers Analysis and approach is described in "MITRE Open CTI Contribution to Cyber Situational Awareness" - Christopher Maidens Methods for processing the data: Attack sequences were converted into csv files (see attackmodels in code.zip) The csv files were then converted into a python file containing a list of lists. Each list containins a set of dicts Each dict representing an attack sequence step. These sequences are then processed using python code to understand the behaviours within the attacks analysed Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers: Software has been created using standard python and libraries These are identfied through the import statements at the top of each source file One R code file is also included (again required libraries can be identified through imports at the top of each code file) Standards and calibration information, if appropriate: Not relevant Environmental/experimental conditions: This has been run on Windows 10 with standard Python and R installations Additional standard packages/libraries (within Python and R) as described above Describe any quality-assurance procedures performed on the data: Quality Assurance is provided through the use of the MITRE ATT&CK data source People involved with sample collection, processing, analysis and/or submission: All actions executed by Chistopher Maidens -------------------------- DATA-SPECIFIC INFORMATION -------------------------- Number of variables: For each step in a recorded attack sequence (see code.zip - AttackModels directory) contains For detailed description see "MITRE Open CTI Contribution to Cyber Situational Awareness" - Christopher Maidens ID: Step number Tactic: MITRE ATT&CK Tactic ID Techniquue: MITRE ATT&CK Technique ID Pred: The ID preceding this step TInc: Time increment from start of the attack (currently unused) S/G: S or G indicating type of step KC Step: Kill Chain step type (currently Unified Kill Chain UKC) Notes: Additional notes (documentary only) MITRE ATT&CK field descriptions can be found at the website (currently https://attack.mitre.org/) Number of cases/rows: The Attack Models described include data for 35 attacks These 35 attacks include 537 attack step observations Variable list, defining any abbreviations, units of measure, codes or symbols used: Not applicable for this data (see above Number of Variables) Missing data codes: Not applicable for this data (see above) Specialized formats or other abbreviations used: Data is stored in CSV format It is also found converted into a python list of attacks (code.py - MAFpt_ATTACK_DB_ATTACK_GRAPHS_DATA_AUTO.py). Each attack is made up a list of dicts for describing each step (as above) This has also been converted into python standard lightweight graph database format - NetworkX (code.py - MAFpt_ATTACKS_GRAPH_v5.gexf)