README — Thesis Dataset for “Essays in Macro Asset Pricing Models with News Sentiment” DOI: https://doi.org/10.5258/SOTON/D3756 Author: Xuefeng Yang (University of Southampton) ORCID: 0009-0005-7902-9854 Date of Final Dataset Submission: 2025 ⸻ 1. Overview of the Dataset This dataset accompanies the PhD thesis: “Essays in Macro Asset Pricing Models with News Sentiment” submitted to the University of Southampton as part of the requirements for the award of Doctor of Philosophy. The dataset contains both raw and processed data used in the empirical analyses across the thesis chapters, including sentiment indices construction, PCA factors, macro-financial variables, and option-implied sentiment measures. The dataset supports all tables, figures, and empirical results reported in the thesis. ⸻ 2. Contents of the Dataset The dataset includes the following components (file list may vary depending on your actual ZIP file): 2.1 Data Files This research uses six main datasets across the three empirical chapters: 1. dataset_chapter1.pkl Monthly returns of the 25 Fama–French Size/Book-to-Market portfolios and associated Fama–French risk factors for Chapter 1 analysis. 2. dataset_chapter2.pkl Daily data for the 100 most liquid S&P 500 stocks, including Garman–Klass volatility, Carhart Four Factors, and interest rates, used in Chapter 2 cross-sectional quantile regressions. 3. options_chapter3.pkl SPX options data from OptionMetrics, used for Chapter 3 risk-neutral density estimation and behavioural pricing-kernel modelling. 4. spx_index.csv S&P 500 index prices and returns downloaded via Yahoo Finance (yfinance) for Chapter 3. 5. welch_goyal_controls.csv Predictive financial variables from Welch and Goyal (2008) used as control variables in Chapter 3. 6. sentiment_raw.csv Raw sentiment indices (SSW, BW, MCSI, CCI, AAII, TRMI, etc.) used to construct PCA-based sentiment factors for Chapter 2. These datasets provide the foundation for empirical analysis across all chapters, ensuring consistency and reproducibility throughout the research. 2.2 Documentation • README.md (this file) 3. Data Sources and Acknowledgments: I sourced the data for the 25 Portfolios Formed on Size and Book-to-Market from the online data library of Kenneth R. French. I extend my gratitude to Professor Martine Lettau and Professor Sidney Ludvigson for making their respective datasets publicly available on their websites. The financial control variables used in this research are sourced from the dataset maintained by Welch and Goyal (2008), and I thank the authors for providing this data for academic use. 4. Associated Publication This dataset is associated with the PhD thesis: Yang, X. (2025). Essays in Macro Asset Pricing Models with News Sentiment. University of Southampton. Any future working papers or journal publications based on the thesis will also reference this dataset. ⸻ 5. Licensing and Reuse The dataset is released under the: 📌 Creative Commons Attribution (CC-BY 4.0) Licence Users may: • Share — copy and redistribute the material • Adapt — remix, transform, and build upon the material As long as appropriate credit is given. 6. Ethical Information This dataset does not include any personal data. 7. How to Cite this Dataset Please cite this dataset as: Yang, X. (2025). Dataset for “Essays in Macro Asset Pricing Models with News Sentiment”. University of Southampton. DOI: 10.5258/SOTON/D3756 ⸻ 8. Contact For questions regarding the dataset: Email: xy1e22@soton.ac.uk Alternative: researchdata@soton.ac.uk (Research Data Management Team) ⸻