READ ME File For 'Essays on the Role of Memory in Financial Markets – Supporting Dataset' Dataset DOI: 10.5258/SOTON/D3837 ReadMe Author: Dmitri Mustanen, University of Southampton ORCID ID https://orcid.org/0009-0008-7918-7790 This dataset supports the thesis entitled 'Essays on the Role of Memory in Financial Markets' AWARDED BY: University of Southampton DATE OF AWARD: 2026 Date of data collection: Chapter 2: 14-01-2023 Chapter 3: 10-10-2023 Chapter 4: 02-07-2025 Information about geographic location of data collection: Not applicable in the conventional fieldwork sense. The datasets comprise secondary financial market time series obtained electronically from publicly available sources and commercial data providers, and processed for the thesis analyses. Licence: Creative Commons Attribution (CC BY). Related projects/Funders: University of Southampton (Southampton Business School) – Doctoral College research scholarship. The research was not supported by any external grant funding. -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains: CH2_Data.xlsx Chapter 2 dataset for “Improving System Generalisation & Forecastability via Spillover-Based Variable Selection Approach.” Includes Tabs: (i) main analysis raw panel, (ii) z-score normalised panel used for modelling, and (iii) a robustness-check dataset used for comparison. CH3_Data.xlsx Chapter 3 dataset for “Forward and Backward Memory in Fractionally Cointegrated Systems.” Includes Tabs: (i) oil-market system (Brent/WTI spot + PCA factors) and (ii) cryptocurrency system (crypto prices + macro/uncertainty/liquidity proxies) used for bidirectional memory analysis. CH4_Data.xlsx Chapter 4 dataset for “A Memory-driven Multi-period Capital Asset Pricing Model” Includes US and Japan equity universes with (i) price panels, (ii) return panels, and (iii) memory (fractional integration parameter d) panels used in the empirical tests. Relationship between files, if important for context: Each Excel file corresponds to a self-contained thesis chapter and contains the data required to reproduce the chapter’s empirical analysis (and its core summary tables/figures), including any derived panels (normalised variables, PCA factors, returns, and memory estimates). The files are independent but share the thesis-level methodological theme of modelling system dynamics and memory in financial time series. Additional related data collected that was not included in the current data package: Large intermediate data pulls (e.g., full Bloomberg futures-chain extractions) and intermediate transformation objects used during exploratory processing are not included. The deposited files contain the final analysis panels required for replication. If data was derived from another source, list source: Chapter 2: Commercial data (e.g., Bloomberg) for oil spot and futures-curve series; futures-curve derived measures (Expansion, Liquidity, Regime) and PCA factors are constructed as described in the thesis. Chapter 3: Commercial data (e.g., Bloomberg) for oil and related financial variables; publicly available sources for cryptocurrency prices and uncertainty indices/proxies (as listed in the thesis chapter data sources). Chapter 4: Publicly available and/or commercial index/asset series (as described in Chapter 4) used to build US and Japan equity universes; returns and memory estimates are computed from those series following the methods documented in the thesis. If there are there multiple versions of the dataset, list the file updated, when and why update was made: Not applicable. The deposited files correspond to the final thesis supporting dataset for deposit and DOI registration. -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: The dataset is generated as part of the thesis empirical workflow. Data are assembled from market data providers and publicly available sources and transformed into analysis panels aligned to each chapter’s identification strategy: Chapter 2 constructs a system of oil-market variables combining observed spot prices with futures-curve derived measures and global PCA factors; the system is used for spillover estimation, variable selection, and medium-horizon forecasting frameworks. Chapter 3 constructs oil and crypto systems to study forward- and backward-looking memory in price formation using the chapter’s time-directional estimation framework and FCVAR-based system modelling documented in Chapter 3. Chapter 4 constructs US and Japan equity universes, computes returns, and estimates memory (fractional integration parameter d) used in the chapter’s multi-period asset pricing tests documented in Chapter 4. Methods for processing the data: Chapter 2: Data were aligned to a common daily trading calendar. Futures-curve information was transformed into term-structure measures capturing Expansion, Liquidity, and Regime effects, as described in the thesis. Global factors were extracted via principal component analysis (PCA). Variables were z-score normalised to ensure comparability prior to spillover estimation, variable selection, and system-based forecasting. Chapter 3: Market data were aligned across oil and cryptocurrency systems. Futures-curve information was summarised using PCA-based factors. The resulting systems were used to study forward- and backward-looking memory in price formation using the bidirectional framework documented in the thesis. Chapter 4: Equity price series were aligned by trading calendar and transformed into return series. Memory parameters (fractional integration parameter d) were estimated for each asset using the frequency-domain approach described in the thesis. These processed panels form the basis for the empirical evaluation of the memory-driven multi-period asset pricing framework. Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers: The deposited files are standard Microsoft Excel workbooks (.xlsx) and can be opened with Excel or compatible software (e.g., LibreOffice). No specialised instrument output formats are used. Standards and calibration information, if appropriate: Not applicable. Environmental/experimental conditions: Not applicable (secondary financial market datasets). Describe any quality-assurance procedures performed on the data: Quality assurance follows the thesis diagnostics and includes checks appropriate for time-series modelling (e.g., stationarity/unit-root diagnostics prior to cointegration-based modelling in Chapter 2; robustness and diagnostic checks referenced in Chapter 3; and summary-statistic validation and panel integrity checks for the equity universes in Chapter 4). Missingness arises primarily from market holidays, series availability, and estimation-window requirements. People involved with sample collection, processing, analysis and/or submission: Dmitri Mustanen (data assembly, processing, analysis, submission). Supervisory oversight and guidance are acknowledged in the thesis (Professor Taufiq Choudhry; Dr Ahmad Maaitah). -------------------------- DATA-SPECIFIC INFORMATION -------------------------- CH2_Data.xlsx (Chapter 2) Number of variables: Main Analysis - Raw Data: 12 columns (including DATE) Main Analysis - Normalized Data: 12 columns (including DATE) Robustness Check - Data: 12 columns (including DATE) Number of cases/rows: Main Analysis - Raw Data: 5,014 Main Analysis - Normalized Data: 5,014 Robustness Check - Data: 1,121 Variable list, defining any abbreviations, units of measure, codes or symbols used: Main Analysis - Raw Data / Normalized Data DATE Trading date BSP Brent spot price WSP WTI spot price BE Brent term-structure “Expansion” factor BL Brent term-structure “Liquidity” factor BR Brent term-structure “Regime” factor WE WTI term-structure “Expansion” factor WL WTI term-structure “Liquidity” factor WR WTI term-structure “Regime” factor E Global PCA “Expansion” factor E L Global PCA “Liquidity” factor L R Global PCA “Regime” factor R Robustness Check - Data DATE Trading date INE Alternative indicator TR Alternative indicator IR Alternative indicator ER Alternative indicator CF Alternative indicator WTI WTI spot price Brent Brent spot price SHCI Shipping cost index EI Energy index EPU Economic Policy Uncertainty index Missing data codes: Blank cells indicate missing values. No imputation is applied in the deposited files. Specialized formats or other abbreviations used: Chapter-2 abbreviations follow the thesis notation. Constructed term-structure factors (Expansion/Liquidity/Regime) and PCA factors are documented in Chapter 2. Date that the file was created: January, 2023 -------------------------- CH3_Data.xlsx (Chapter 3) Number of variables: OIL Market Data: 6 columns (including DATE) Cryptocurrency Market Data: 8 columns (including Date) Number of cases/rows: OIL Market Data: 1,324 Cryptocurrency Market Data: 1,786 Variable list, defining any abbreviations, units of measure, codes or symbols used: OIL Market Data DATE Trading date BSP Brent spot price WSP WTI spot price E PCA-based Global PCA “Expansion” factor E L PCA-based Global PCA “Liquidity” factor L R PCA-based Global PCA “Regime” factor R Cryptocurrency Market Data Date Trading date BITC Bitcoin price ETH Ethereum price EGY Energy proxy INFL Inflation proxy LQT Liquidity proxy HDG Hedging-demand proxy UCY Uncertainty proxy Missing data codes: Blank cells indicate missing values. Specialized formats or other abbreviations used: Abbreviations correspond to conceptual channels discussed in Chapter 3 (fundamental vs anticipatory drivers) and are described in the chapter’s data section. Date that the file was created: October, 2023 -------------------------- CH4_Data.xlsx (Chapter 4) Number of variables: US - Asset Price Universe: 61 columns (Date + 60 assets) US - Returns Universe: 61 columns (Date + 60 assets) US - Memory Universe: 61 columns (Date + 60 assets) JP - Asset Price Universe: 61 columns (Date + 60 assets) JP - Returns Universe: 61 columns (Date + 60 assets) JP - Memory Universe: 61 columns (Date + 60 assets) Number of cases/rows: US - Asset Price Universe: 5,307 US - Returns Universe: 5,306 US - Memory Universe: 5,247 JP - Asset Price Universe: 3,789 JP - Returns Universe: 3,788 JP - Memory Universe: 3,669 Variable list, defining any abbreviations, units of measure, codes or symbols used: Each sheet includes: Date Trading date Asset columns labelled by ticker symbols (e.g., US tickers such as AAPL; Japanese tickers commonly include .T suffix). Returns sheets contain daily returns computed from price series (standard log-return/return transformation as used in the thesis). Memory sheets contain estimated fractional integration parameters (memory, d) for each asset, computed following the Chapter 4 methodology. Memory parameters are unit-free. Missing data codes: Blank cells indicate missing values, typically due to market holidays, series availability, and estimation-window requirements for memory estimation. Specialized formats or other abbreviations used: Tickers follow standard exchange identifiers. Memory is reported as parameter d (fractional integration). Date that the file was created: July, 2025