READ ME File For Dataset for Essays on Volatility Timing and Centrality-Driven Liquidity in Equity and Cryptocurrency Markets ReadMe Author: Yue Zhang, University of Southampton, ORCID ID 0009-0003-9501-8878 This dataset supports the thesis entitled Essays on Volatility Timing and Centrality-Driven Liquidity in Equity and Cryptocurrency Markets. AWARDED BY: University of Southampton DATE OF AWARD: 2025 Date of data collection: From September 2021 to September 2024 -------------------- DATA & FILE OVERVIEW -------------------- C3_bab.csv contains the time-series return data for each subsample sorted into decile using the betting-against-beta strategy for making the betting-against-beta table in Chapter 3 of the thesis. C3_doublesorting.csv contains the panel data of risk-neutral-volatility-scaled returns and stock characteristics' classifications for making the double-sorting robustness check tables in Chapter 3 of the thesis. C3_fs.csv contains the time-series unscaled and volatility-managed returns for making direct performance ratio comparison tables and aggregate-level scaling benchmark tables in Chapter 3 of the thesis. C3_fsreg.csv contains the panel data of stocks' unscaled and scaled returns and control variables in each cross section in Chapter 3 of the thesis. C3_idiovol.xlsx contains the time-series data of daily stock returns and control variables for each stock in Chapter 3 of the thesis. C3_ost.xlsx contains the time-series stock unscaled and scaled return data with different training periods for making out-of-sample test table in Chapter 3 of the thesis. C3_portscaled.csv contains the panel data of cross-sectional average stock unscaled and scaled returns in each leverage quartile subset for adapting the leverage-sorted direct comparison and regression analyses in Chapter 3 of the thesis. C3_risk-sorted.xlsx contains the time-series average volatility and firm characteristic classification data for each sample firm for making the sorting table in Chapter 3 of the thesis. C3_spanning_sorted.csv contains the panel data of stocks' unscaled and scaled returns, control variables, and firm characteristic classification data in each cross section in Chapter 3 of the thesis. C3_transaction_cost.csv contains the panel data of stocks' unscaled and scaled returns with different transaction costs and leverage constraints, and control variables in each cross section in Chapter 3 of the thesis. C4_heatmap.xlsx contains the cross-sectional average centrality data for each industry of the sample stocks for making the heatmap in Chapter 4 of the thesis. C4_assetcapmerged.csv contains the panel data for each mutual funds' ownership on each stock in the sample for discovering the comprehensive pairwise combination of mutual funds' pairwise common ownership in Chapter 4 of the thesis. C4_conliqplot.csv contains the time-series average centrality and illiquidity data for each sample stocks and market funding liquidity proxy data for making the time-series plot in Chapter 4 of the thesis. C4_CONNECTPRICING.dta contains the panel data of stock characteristics for running pricing regressions in Chapter 4 of the thesis. C4_FMB.dta contains the panel data of stock characteristics for running Fama-Macbeth pricing regressions in Chapter 4 of the thesis. C4_logreg_stdfcapcen.csv contains panel data for stock characteristics for running bi-directional panel regressions between stock centrality and illiquidity in Chapter 4 of the thesis. C4_PCsharesmerged.csv contains the panel data for each pairwise of sample stocks owned by a common fund for computing the stock-level connectedness in Chapter 4 of the thesis. C4_PVARGMM.dta contains the panel centrality, illiquidity, and return data for each sample stock for adapting the panel vector autoregression analyses and impulse response functions in Chapter 4 of the thesis. C4_SCATTER.dta contains the time-series average data for each sample stocks for making the scatter plot in Chapter 4 of the thesis. C5_BDMI.csv contains the time-series data for the sorting and spanning regression analyses in Chapter 5 of the thesis. -------------------------- METHODOLOGICAL INFORMATION -------------------------- Detailed methodologies have been elaborated in Chapter 2 and other related sections in Chapter 3, 4, and 5. R and Stata codes are available upon request. C3_bab.csv: The detailed methods used for generating the betting-against-beta factor returns are discussed in Chapter 3 of the thesis. C3_doublesorting.csv: The detailed methods used for sorting volatilities while controlling firm characteristics are discussed in Chapter 3 of the thesis. C3_fs.csv and C3_fsreg.csv: The different volatility-timing and regression methods are discussed in detail in Chapter 2 and Chapter 3 of the thesis. C3_idiovol.xlsx: The panel regression approach adapted for deriving stocks' idiosyncratic volatilities is discussed in detail in Chapter 3 of the thesis. C3_ost.xlsx: The detailed methods used for running out-of-sample tests are discussed in Chapter 3 of the thesis. C3_portscaled.csv: The detailed methods used for constructing leverage quartiles, re-scaling returns in each leverage quartile subset, and adapting direct comparison and spanning regressions in each leverage quartile subset are discussed in Chapter 3 of the thesis. C3_risk-sorted.xlsx: Each stock's time-series average volatilities are computed and sorted with different firm characteristics. The detailed sorting methods are discussed in Chapter 3 of the thesis. C3_spanning_sorted.csv: The panel spanning regression methods applied on different subsamples constructed by a range of firm characteristics are discussed in detail in Chapter 3 of the thesis. C3_transaction_cost.csv: The detailed methods of applying different levels of transaction costs and leverage constraints for panel regressions are discussed in Chapter 3 of the thesis. C4_heatmap.xlsx: The detailed methods of computing the Centrality-Weighted-Eigenvector-Centrality (CWEC), illiquidity measure, and cross-sectional average observations in each industry are discussed in Chapter 2 and Chapter 4 of the thesis. C4_assetcapmerged.csv: The detailed methods of identifying comprehensive pairwise combinations according to mutual funds' ownership to stocks are discussed in Chapter 4 of the thesis. C4_conliqplot.csv: The detailed methods of computing the illiquidity and CWEC are discussed in Chapter 2 and Chapter 4 of the thesis. C4_CONNECTPRICING.dta: The detailed methods of running pricing regressions are discussed in Chapter 4 of the thesis. Specific software required: STATA 17 or above C4_FMB.dta: The detailed methods of running Fama-Macbeth pricing regressions are discussed in Chapter 4 of the thesis. Specific software required: STATA 17 or above C4_logreg_stdfcapcen.csv: The detailed methods of computing the Centrality-Weighted-Eigenvector-Centrality (CWEC), illiquidity measure, other related control variables, and the regression equations are discussed in Chapter 2 and Chapter 4 of the thesis. C4_PCsharesmerged.csv: The detailed methods of identifying comprehensive pairwise combinations according to mutual funds' ownership to stocks are discussed in Chapter 4 of the thesis. C4_PVARGMM.dta: The detailed methods of panel VAR and IRF are discussed in Chapter 4 of the thesis. Specific software required: STATA 17 or above C4_SCATTER.dta: The detailed methods of computing the Centrality-Weighted-Eigenvector-Centrality (CWEC) are discussed in Chapter 2 and Chapter 4 of the thesis. Specific software required: STATA 17 or above C5_BDMI.csv: The detailed methods of computing volatility-managed returns are discussed in Chapter 2 and Chapter 5 of the thesis. -------------------------- DATA-SPECIFIC INFORMATION -------------------------- C3_bab.csv: Number of variables: 17 Number of cases/rows: 312 Variable list, defining any abbreviations, units of measure, codes or symbols used: month: calendar month with the format of YYYYMM bab: the betting-against-beta factor return P1 to P10: average return of each subsample sorted into decile in each cross-section xr, smb, hml, umd: Fama-French three factor returns and Carhart momentum factor returns skew: stock return skewness Missing data codes: NA Date that the file was created: July 2023 C3_doublesorting.csv: Number of variables: 15 Number of cases/rows: 354744 Variable list and definition: firm: firm identifier month: calendar month with the format of YYYYMM xret: stock unscaled excess return realvar: square of stock realised volatility sqmw, sqglb, sqiv: square of stock risk-neutral volatility rvmgd, ivmgd, mwmgd, glbmgd: stock volatility-managed return leverage: firm leverage classification, sorted into quartile marketcap: stock market capitalisation classification, sorted into quartile industry: firm industry classification creditrating: stock credit rating classification Missing data codes: NA Date that the file was created: November 2023 C3_fs.csv: Number of variables: 11 Number of cases/rows: 312 Variable list and definition: month: calendar month with the format of YYYYMM firmrvmgd, firmivmgd, firmmwmgd, firmglbmgd: cross-sectional average of firm-level volatility managed return xret: cross-sectional average of stock unscaled excess return aggregateavgrvmgd, aggregateivmgd, aggregatemwmgd, aggregateglbmgd: aggregate-level volatility managed return Date that the file was created: June 2022 C3_fsreg.csv: Number of variables: 14 Number of cases/rows: 354744 Variable list and definition: date: calendar month with the format of YYYYMM firm: firm identifier trend: time identifier trend2: time identifier for January effect excess_ret: stock unscaled return mw_ret, glb_ret, rvmgd, iv_ret: volatility-managed return xr, smb, hml, umd: Fama-French three factor returns and Carhart momentum factor returns Date that the file was created: July 2022 C3_idiovol.xlsx: Number of variables: 7 Number of cases/rows: 6785 Variable list and definition: date: calendar date with the format of YYYYMMDD month: calendar month with the format of YYYYMM dailyreturn: stock daily return xr, smb, hml: Fama-French three factor returns RF: risk-free rate Date that the file was created: July 2023 C3_ost.xlsx: Number of variables: 11 Number of cases/rows: 312 Variable list and definition: month: calendar month with the format of YYYYMM firmrvmgd, firmivmgd, firmmwmgd, firmglbmgd: cross-sectional average of firm-level volatility managed return xret: cross-sectional average of stock unscaled excess return aggregateavgrvmgd, aggregateivmgd, aggregatemwmgd, aggregateglbmgd: aggregate-level volatility managed return Date that the file was created: November 2023 C3_portscaled.csv: Number of variables: 6 Number of cases/rows: 1248 Variable list and definition: portfolio: leverage portfolio identifier month: calendar month with the format of YYYYMM rvmgd, ivmgd, mwmgd, glbmgd: stock volatility-managed return Missing data codes: NA Date that the file was created: April 2023 C3_risk-sorted.xlsx: Number of variables: 11 Number of cases/rows: 1137 Variable list and definition: cusip, permno: stock identifier sqRV: squared realised volatility IV, SVIX, GLB: risk-neutral volatility Leverage: firm time-series average leverage ratio computed as total debt over total asset Market cap: stock market capitalisation computed as the natural logarithm of the time-series average product of market prices and shares outstanding at the end of each month xret: stock unscaled excess return Industry: firm industry classification Credit rating: stock credit rating classification Missing data codes: N/A Date that the file was created: November 2022 C3_spanning_sorted.csv: Number of variables: 19 Number of cases/rows: 354744 Variable list and definition: date: calendar month with the format of YYYYMM firm, permno: firm identifier trend: time identifier trend2: time identifier for January effect excess_ret: stock unscaled return mw_ret, glb_ret, rvmgd, iv_ret: volatility-managed return xr, smb, hml, umd: Fama-French three factor returns and Carhart momentum factor returns leverage: firm leverage classification, sorted into quartile marketcap: stock market capitalisation classification, sorted into quartile industry: firm industry classification creditrating: stock credit rating classification Date that the file was created: November 2023 C3_transaction_cost.csv: Number of variables: 316 Number of cases/rows: 354744 Variable list and definition: month: calendar month with the format of YYYYMM firm: firm identifier rvmgd, ivmgd, mwmgd, glbmgd: stock volatility-managed return trend2: time identifier for January effect prefix constructed by xret/rvmgd/ivmgd/mwmgd/glbmgd + 30/60/90/180 refers to the unscaled and scaled returns with monthly, bi-monthly, quarterly, or semi-yearly trading frequencies, respectively. suffix of 1/5/10/15/20 bps refers to the transaction costs under different trading frequencies, respectively. nolev/50lev refers to no leverage or 50% leverage allowed, respectively. Date that the file was created: November 2023 C4_heatmap.xlsx: Number of variables: 18 Number of cases/rows: 30 Variable list and definition: Industry and indcode: industry identifier Q1_2006 to Q4_2009: quarter identifier Date that the file was created: September 2024 C4_assetcap.csv: Number of variables: 11 Number of cases/rows: 223650 Variable list and definition: fdate: quarter identifier cusip, stkname: stock identifier fundno, fundname: fund identifier assets: value of assets owned by the fund at the end of the quarter (*10000) shares: shares of the stock owned by the fund prc: end of quarter stock price shrout: end of quarter shares outstanding (*1000) cap: end of quarter stock market cap (*1000) Date that the file was created: March 2023 C4_conliqplot.csv: Number of variables: 4 Number of cases/rows: 94 Variable list and definition: quarter: calendar quarter with the format of YYYYMM avgamihud: cross-sectional average of sample stock illiquidity measure avgeigcen: cross-sectional average of sample stock centrality measure tight_CI: the proxy of market funding liquidity Date that the file was created: September 2024 C4_CONNECTPRICING.dta: Number of variables: 27 Number of cases/rows: 57152 Variable list and definition: qrank, fdate, quarter: quarter identifier cusip, firm: stock identifier angvol: stock idiosyncratic volatility vix: VIX index dm1: money supply change qxret: stock quarterly excess return ffc4alpha: the constant of regressions of stock returns against Fama-French-Carhart 4 factors size: natural logarithm of stock market capitalisation stkfcap: stock cross-sectional average common ownership degcen: stock degree centrality logeigcen: natural logarithm of eigenvector centrality mktrf, smb, hml, mom: Fama-French three factor returns and Carhart momentum factor returns cenlagone_quartile: quartile classification of lagged centrality measure Date that the file was created: December 2023 C4_FMB.dta: Number of variables: 15 Number of cases/rows: 57152 Variable list and definition: qrank, quarter: quarter identifier cusip, firm: stock identifier qlogxret: quarterly natural logarithm of stock excess return eigcen: stock eigenvector centrality logeigcen: natural logarithm of stock eigenvector centrality angvol: stock idiosyncratic volatility vix: VIX index dm1: money supply change mktrf, smb, hml, mom: Fama-French three factor returns and Carhart momentum factor returns fcap_decile: decile classification of stock cross-sectional average common ownership Date that the file was created: June 2024 C4_logreg_stdfcapcen.csv: Number of variables: 19 Number of cases/rows: 57152 Variable list and definition: qrank, quarter: quarter identifier cusip: stock identifier qlogxret: quarterly natural logarithm of stock excess return angvol: stock idiosyncratic volatility VIX: VIX index dm1: money supply change mktrf, smb, hml, mom: Fama-French three factor returns and Carhart momentum factor returns cwec: stock centrality measure logcwec: natural logarithm of stock centrality measure degcen: stock degree centrality ffc4alpha: the constant of regressions of stock returns against Fama-French-Carhart 4 factors size: natural logarithm of stock market capitalisation stkfcap: stock cross-sectional average common ownership logamihudsc: natural logarithm of the stock illiquidity measure Missing data codes: NA Date that the file was created: November 2023 C4_PCsharesmerged.csv: Number of variables: 13 Number of cases/rows: 12230888 Variable list and definition: fdate: quarter identifier pair, pairno: pair identifier fundno: fund identifier stki, stkj: stocks within each pair stkishares, stkjshares: shares of stocks within each pair owned by the fund stkiprc, stkjprc: end of quarter prices of stocks within each pair stkishrout2, stkjshrout2: end of quarter shares outstanding of stocks within each pair FCAP: common ownership Date that the file was created: March 2023 C4_PVARGMM.dta: Number of variables: 13 Number of cases/rows: 57152 Variable list and definition: fdate, qrank, quarter: quarter identifier cusip, firm: stock identifier ffc4alpha: the constant of regressions of stock returns against Fama-French-Carhart 4 factors size: natural logarithm of stock market capitalisation logilliquidity: natural logarithm of stock illiquidity measure degcen: stock degree centrality logcenrality: natural logarithm of stock eigenvector centrality angvol: stock idiosyncratic volatility vix: VIX index dm1: money supply change Date that the file was created: November 2023 C4_SCATTER.dta: Number of variables: 4 Number of cases/rows: 608 Variable list and definition: cusip: stock identifier degcen: stock degree centrality eigcen: stock eigenvector centrality avgstkstdrankfcap: stock time-series average standardised rank-transformed common ownership Date that the file was created: September 2024 C5_BDMI.csv: Number of variables: 34 Number of cases/rows: 1761 Variable list, defining any abbreviations, units of measure, codes or symbols used: date: calendar date with the format of YYYYMMDD month: calendar month with the format of YYYYMM BDMI, BDMLCI, BDMXLCI: Broad Digital Market Index, Broad Digital Market Large-Cap Index, Broad Digital Market Excluding Large-Cap Index ffr: federal fund rate BDMI_logxret, BDMLCI_logxret, BDMXLCI_logxret: natural logarithm of daily excess returns of BDMI, BDMLCI, and BDMXLCI ps: penny stock index ps_logxret: natural logarithm of daily excess returns of ps BDMI_rvmgd, BDMLCI_rvmgd, BDMXLCI_rvmgd,ps_rvmgd: realised volatility managed BDMI, BDMLCI, BDMXLCI, and ps excess returns BDMI_rvsort, BDMLCI_rvsort, BDMXLCI_rvsort,ps_rvsort: realised volatility sorted classifications for BDMI, BDMLCI, BDMXLCI, and ps BDMIxret_max, BDMLCIxret_max, BDMXLCIxret_max,psxret_max: the maximum daily excess return in the last rolling month of BDMI, BDMLCI, BDMXLCI, and ps BDMI_maxsort, BDMLCI_maxsort, BDMXLCI_maxsort,ps_maxsort: max sorted classifications for BDMI, BDMLCI, BDMXLCI, and ps fng_value, fng_classification: fear & greed index and its classification SGX, phase: SG sentiment index and its classified phase VIX, VIX_sort: VIX index and its sorted classifications Missing data codes: NA Date that the file was created: February 2024