The University of Southampton
University of Southampton Institutional Repository

Three essays in high dimensional linear regression in economics

Three essays in high dimensional linear regression in economics
Three essays in high dimensional linear regression in economics
This thesis provides an overview of methods used for forecasting economic variables with high dimensional data sets. Based on these findings, from here, it proposes new approaches by modifying existing methods with justification based on statistical theory and tests with simulated and real-world data. The first essay explores how the issue of high dimensional data sets is consistently increasing the challenges in both a logistical and analytical sense for any field relying on quantitative analysis. In economics this is primarily in the form of least-squares based approaches being unreliable or infeasible for linear regression frameworks in addition to the difficulties already faced by economists, such as significant predictor correlation and temporal dependence in time series covariates. Through finely tuned simulation experiments, this essay compares the prediction accuracy of existing high dimensional linear regression methods in a forecasting setting. As more correlation is induced between the covariates the results are very close with Principal Components and Ridge Regression, with the former appearing to hold a slight edge. While in the temporal dependence setting, Random Projections from the machine learning literature clearly dominates as the variables approach the unit root mark. The second essay addresses how high dimensional data sets are becoming increasingly present to econometricians combined with predictor sets that are characterized by significant correlation amongst the covariates. While the famous Ridge Regression of Hoerl and Kennard (1970) provides a neat feasible alternative to Ordinary Least Squares, parameter estimates can suffer through their bias when the sign and magnitude of true coefficients vary significantly. To overcome such a setback, this paper proposes the Partial Ridge and Hybrid Estimation Procedure approach that vary which predictor coefficients face penalisation allowing a more desirable bias-variance tradeoff to be achieved for prediction purposes. Through theoretical analysis, a Monte Carlo Simulation study and an application to Iranian residential property price data, it is shown that while Partial Ridge alone is unable to universally dominate Ridge Regression, combining the two estimation procedures can lead to improved predictive accuracy that can outperform Ridge alone. The final essay focuses on a method known as Random Projections and is seen as a computationally faster alternative to the widely used Principal Components Analysis for dimension reduction. However, while there is extensive evidence of work focusing on creating dependent variable predictions and forecasts using Random Projections, there is very little focus on its use for constructing estimates of the individual coefficient values themselves. Despite this, there is broad area within the statistics literature that involves finding the non-parametric bootstrap distribution of the coefficients. However, the distribution used in Random Projections is a parametric one, and allows the utilisation of additional key theoretical results to assist with estimation accuracy. Through the use of a Frisch-Waugh style approach, the so-called Partial Random Projections is proposed as a way to obtain individual parameter estimates in high dimensional settings whilst remaining in the spirit of Random Projections, which is seen to perform very well in the first essay. The theoretical analysis shows how its bias can often improve upon many other techniques when the sum of all other parameters except the one of interest is small and its associated covariate has a small amount of correlation with the others. Finally, a Monte Carlo Simulation study replicating a causal inference study demonstrates how this approach can practically provide more accurate estimates than other competing models.
University of Southampton
Thorburn, Richard John
15924cbc-a203-461c-8de5-ad4a996a7b8a
Thorburn, Richard John
15924cbc-a203-461c-8de5-ad4a996a7b8a
Pitarakis, Jean-Yves
ee5519ae-9c0f-4d79-8a3a-c25db105bd51
Olmo, Jose
706f68c8-f991-4959-8245-6657a591056e

Thorburn, Richard John (2024) Three essays in high dimensional linear regression in economics. University of Southampton, Doctoral Thesis, 190pp.

Record type: Thesis (Doctoral)

Abstract

This thesis provides an overview of methods used for forecasting economic variables with high dimensional data sets. Based on these findings, from here, it proposes new approaches by modifying existing methods with justification based on statistical theory and tests with simulated and real-world data. The first essay explores how the issue of high dimensional data sets is consistently increasing the challenges in both a logistical and analytical sense for any field relying on quantitative analysis. In economics this is primarily in the form of least-squares based approaches being unreliable or infeasible for linear regression frameworks in addition to the difficulties already faced by economists, such as significant predictor correlation and temporal dependence in time series covariates. Through finely tuned simulation experiments, this essay compares the prediction accuracy of existing high dimensional linear regression methods in a forecasting setting. As more correlation is induced between the covariates the results are very close with Principal Components and Ridge Regression, with the former appearing to hold a slight edge. While in the temporal dependence setting, Random Projections from the machine learning literature clearly dominates as the variables approach the unit root mark. The second essay addresses how high dimensional data sets are becoming increasingly present to econometricians combined with predictor sets that are characterized by significant correlation amongst the covariates. While the famous Ridge Regression of Hoerl and Kennard (1970) provides a neat feasible alternative to Ordinary Least Squares, parameter estimates can suffer through their bias when the sign and magnitude of true coefficients vary significantly. To overcome such a setback, this paper proposes the Partial Ridge and Hybrid Estimation Procedure approach that vary which predictor coefficients face penalisation allowing a more desirable bias-variance tradeoff to be achieved for prediction purposes. Through theoretical analysis, a Monte Carlo Simulation study and an application to Iranian residential property price data, it is shown that while Partial Ridge alone is unable to universally dominate Ridge Regression, combining the two estimation procedures can lead to improved predictive accuracy that can outperform Ridge alone. The final essay focuses on a method known as Random Projections and is seen as a computationally faster alternative to the widely used Principal Components Analysis for dimension reduction. However, while there is extensive evidence of work focusing on creating dependent variable predictions and forecasts using Random Projections, there is very little focus on its use for constructing estimates of the individual coefficient values themselves. Despite this, there is broad area within the statistics literature that involves finding the non-parametric bootstrap distribution of the coefficients. However, the distribution used in Random Projections is a parametric one, and allows the utilisation of additional key theoretical results to assist with estimation accuracy. Through the use of a Frisch-Waugh style approach, the so-called Partial Random Projections is proposed as a way to obtain individual parameter estimates in high dimensional settings whilst remaining in the spirit of Random Projections, which is seen to perform very well in the first essay. The theoretical analysis shows how its bias can often improve upon many other techniques when the sum of all other parameters except the one of interest is small and its associated covariate has a small amount of correlation with the others. Finally, a Monte Carlo Simulation study replicating a causal inference study demonstrates how this approach can practically provide more accurate estimates than other competing models.

Text
Southampton_PhD_Thesis_2 - Version of Record
Available under License University of Southampton Thesis Licence.
Download (854kB)
Text
Final-thesis-submission-Examination-Mr-Richard-Thorburn (1)
Restricted to Repository staff only

More information

Published date: October 2024

Identifiers

Local EPrints ID: 495089
URI: http://eprints.soton.ac.uk/id/eprint/495089
PURE UUID: 529ebd9c-e6c9-4d25-b8ef-92aa9de3f186
ORCID for Jean-Yves Pitarakis: ORCID iD orcid.org/0000-0002-6305-7421
ORCID for Jose Olmo: ORCID iD orcid.org/0000-0002-0437-7812

Catalogue record

Date deposited: 29 Oct 2024 17:36
Last modified: 30 Oct 2024 02:45

Export record

Contributors

Thesis advisor: Jean-Yves Pitarakis ORCID iD
Thesis advisor: Jose Olmo ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×