Three essays in high dimensional linear regression in economics
Three essays in high dimensional linear regression in economics
This thesis provides an overview of methods used for forecasting economic variables with high dimensional data sets. Based on these findings, from here, it proposes new approaches by modifying existing methods with justification based on statistical theory and tests with simulated and real-world data. The first essay explores how the issue of high dimensional data sets is consistently increasing the challenges in both a logistical and analytical sense for any field relying on quantitative analysis. In economics this is primarily in the form of least-squares based approaches being unreliable or infeasible for linear regression frameworks in addition to the difficulties already faced by economists, such as significant predictor correlation and temporal dependence in time series covariates. Through finely tuned simulation experiments, this essay compares the prediction accuracy of existing high dimensional linear regression methods in a forecasting setting. As more correlation is induced between the covariates the results are very close with Principal Components and Ridge Regression, with the former appearing to hold a slight edge. While in the temporal dependence setting, Random Projections from the machine learning literature clearly dominates as the variables approach the unit root mark. The second essay addresses how high dimensional data sets are becoming increasingly present to econometricians combined with predictor sets that are characterized by significant correlation amongst the covariates. While the famous Ridge Regression of Hoerl and Kennard (1970) provides a neat feasible alternative to Ordinary Least Squares, parameter estimates can suffer through their bias when the sign and magnitude of true coefficients vary significantly. To overcome such a setback, this paper proposes the Partial Ridge and Hybrid Estimation Procedure approach that vary which predictor coefficients face penalisation allowing a more desirable bias-variance tradeoff to be achieved for prediction purposes. Through theoretical analysis, a Monte Carlo Simulation study and an application to Iranian residential property price data, it is shown that while Partial Ridge alone is unable to universally dominate Ridge Regression, combining the two estimation procedures can lead to improved predictive accuracy that can outperform Ridge alone. The final essay focuses on a method known as Random Projections and is seen as a computationally faster alternative to the widely used Principal Components Analysis for dimension reduction. However, while there is extensive evidence of work focusing on creating dependent variable predictions and forecasts using Random Projections, there is very little focus on its use for constructing estimates of the individual coefficient values themselves. Despite this, there is broad area within the statistics literature that involves finding the non-parametric bootstrap distribution of the coefficients. However, the distribution used in Random Projections is a parametric one, and allows the utilisation of additional key theoretical results to assist with estimation accuracy. Through the use of a Frisch-Waugh style approach, the so-called Partial Random Projections is proposed as a way to obtain individual parameter estimates in high dimensional settings whilst remaining in the spirit of Random Projections, which is seen to perform very well in the first essay. The theoretical analysis shows how its bias can often improve upon many other techniques when the sum of all other parameters except the one of interest is small and its associated covariate has a small amount of correlation with the others. Finally, a Monte Carlo Simulation study replicating a causal inference study demonstrates how this approach can practically provide more accurate estimates than other competing models.
University of Southampton
Thorburn, Richard John
15924cbc-a203-461c-8de5-ad4a996a7b8a
October 2024
Thorburn, Richard John
15924cbc-a203-461c-8de5-ad4a996a7b8a
Pitarakis, Jean-Yves
ee5519ae-9c0f-4d79-8a3a-c25db105bd51
Olmo, Jose
706f68c8-f991-4959-8245-6657a591056e
Thorburn, Richard John
(2024)
Three essays in high dimensional linear regression in economics.
University of Southampton, Doctoral Thesis, 190pp.
Record type:
Thesis
(Doctoral)
Abstract
This thesis provides an overview of methods used for forecasting economic variables with high dimensional data sets. Based on these findings, from here, it proposes new approaches by modifying existing methods with justification based on statistical theory and tests with simulated and real-world data. The first essay explores how the issue of high dimensional data sets is consistently increasing the challenges in both a logistical and analytical sense for any field relying on quantitative analysis. In economics this is primarily in the form of least-squares based approaches being unreliable or infeasible for linear regression frameworks in addition to the difficulties already faced by economists, such as significant predictor correlation and temporal dependence in time series covariates. Through finely tuned simulation experiments, this essay compares the prediction accuracy of existing high dimensional linear regression methods in a forecasting setting. As more correlation is induced between the covariates the results are very close with Principal Components and Ridge Regression, with the former appearing to hold a slight edge. While in the temporal dependence setting, Random Projections from the machine learning literature clearly dominates as the variables approach the unit root mark. The second essay addresses how high dimensional data sets are becoming increasingly present to econometricians combined with predictor sets that are characterized by significant correlation amongst the covariates. While the famous Ridge Regression of Hoerl and Kennard (1970) provides a neat feasible alternative to Ordinary Least Squares, parameter estimates can suffer through their bias when the sign and magnitude of true coefficients vary significantly. To overcome such a setback, this paper proposes the Partial Ridge and Hybrid Estimation Procedure approach that vary which predictor coefficients face penalisation allowing a more desirable bias-variance tradeoff to be achieved for prediction purposes. Through theoretical analysis, a Monte Carlo Simulation study and an application to Iranian residential property price data, it is shown that while Partial Ridge alone is unable to universally dominate Ridge Regression, combining the two estimation procedures can lead to improved predictive accuracy that can outperform Ridge alone. The final essay focuses on a method known as Random Projections and is seen as a computationally faster alternative to the widely used Principal Components Analysis for dimension reduction. However, while there is extensive evidence of work focusing on creating dependent variable predictions and forecasts using Random Projections, there is very little focus on its use for constructing estimates of the individual coefficient values themselves. Despite this, there is broad area within the statistics literature that involves finding the non-parametric bootstrap distribution of the coefficients. However, the distribution used in Random Projections is a parametric one, and allows the utilisation of additional key theoretical results to assist with estimation accuracy. Through the use of a Frisch-Waugh style approach, the so-called Partial Random Projections is proposed as a way to obtain individual parameter estimates in high dimensional settings whilst remaining in the spirit of Random Projections, which is seen to perform very well in the first essay. The theoretical analysis shows how its bias can often improve upon many other techniques when the sum of all other parameters except the one of interest is small and its associated covariate has a small amount of correlation with the others. Finally, a Monte Carlo Simulation study replicating a causal inference study demonstrates how this approach can practically provide more accurate estimates than other competing models.
Text
Southampton_PhD_Thesis_2
- Version of Record
Text
Final-thesis-submission-Examination-Mr-Richard-Thorburn (1)
Restricted to Repository staff only
More information
Published date: October 2024
Identifiers
Local EPrints ID: 495089
URI: http://eprints.soton.ac.uk/id/eprint/495089
PURE UUID: 529ebd9c-e6c9-4d25-b8ef-92aa9de3f186
Catalogue record
Date deposited: 29 Oct 2024 17:36
Last modified: 30 Oct 2024 02:45
Export record
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics