Bayesian approaches to model uncertainty in phylogenetics
Bayesian approaches to model uncertainty in phylogenetics
When inferring a phylogeny in a probabilistic framework, one is faced with many choices of how to model the underlying processes that give rise to the observed data. This includes how to accommodate the across-site variation in the properties of the nucleotide substitution process and how to incorporate the across-time variation in the parameters of the tree-generating process.
To model across-site heterogeneity in the properties of the nucleotide substitution process, one commonly pre-defines the partition scheme of the alignment thereby grouping sites into a number of categories and then independently estimates the substitution model of each category. This practice ignores the uncertainty associated with the partition scheme, and the pre-defined partition scheme may not agree with the data.
This thesis first presents three new methods that accommodate the uncertainty associated with the partition scheme. They estimate the number of categories, the assignments of sites to the categories, the nucleotide substitution model and the site rate model for each category, and the uncertainty in these selections. These methods employ approaches of Bayesian model selection and/or Bayesian nonparametrics. They differ in the a priori assumptions on the assignments of sites to categories, and therefore provide different views on the across-site heterogeneity in the properties of the nucleotide substitution process. Analyses with all three methods have found statistical evidence for across-site heterogeneity in the nucleotide substitution process both within a single gene and among genes in various sets of empirical data.
Recently proposed models based on the birth-death-sampling process allow the rate parameters of birth, death and sampling events to vary through time as piecewise constant functions, but require the number of rate shifts to be fixed a priori. This thesis presents a new method that employs a transdimensional sampling algorithm for Bayesian model selection to directly estimate the number of shifts in the parameters of the birth-death-sampling process (or epidemiological parameters in the case of a phylogeny of a rapidly evolving infectious disease).
In summary, I have developed a series of new phylogenetic methods based on Bayesian model selection and Bayesian nonparametrics to permit the direct inference of the across-site variation of the nucleotide substitution process and the acrosstime variation in the birth-death-sampling process. These new methods take into account the uncertainty associated with the alignment partition scheme and the tree generating process, avoiding potential model misspecification.
Wu, Chieh-Hsi
ace630c6-2095-4ade-b657-241692f6b4d3
November 2014
Wu, Chieh-Hsi
ace630c6-2095-4ade-b657-241692f6b4d3
Wu, Chieh-Hsi
(2014)
Bayesian approaches to model uncertainty in phylogenetics.
The University of Auckland, Doctoral Thesis, 234pp.
Record type:
Thesis
(Doctoral)
Abstract
When inferring a phylogeny in a probabilistic framework, one is faced with many choices of how to model the underlying processes that give rise to the observed data. This includes how to accommodate the across-site variation in the properties of the nucleotide substitution process and how to incorporate the across-time variation in the parameters of the tree-generating process.
To model across-site heterogeneity in the properties of the nucleotide substitution process, one commonly pre-defines the partition scheme of the alignment thereby grouping sites into a number of categories and then independently estimates the substitution model of each category. This practice ignores the uncertainty associated with the partition scheme, and the pre-defined partition scheme may not agree with the data.
This thesis first presents three new methods that accommodate the uncertainty associated with the partition scheme. They estimate the number of categories, the assignments of sites to the categories, the nucleotide substitution model and the site rate model for each category, and the uncertainty in these selections. These methods employ approaches of Bayesian model selection and/or Bayesian nonparametrics. They differ in the a priori assumptions on the assignments of sites to categories, and therefore provide different views on the across-site heterogeneity in the properties of the nucleotide substitution process. Analyses with all three methods have found statistical evidence for across-site heterogeneity in the nucleotide substitution process both within a single gene and among genes in various sets of empirical data.
Recently proposed models based on the birth-death-sampling process allow the rate parameters of birth, death and sampling events to vary through time as piecewise constant functions, but require the number of rate shifts to be fixed a priori. This thesis presents a new method that employs a transdimensional sampling algorithm for Bayesian model selection to directly estimate the number of shifts in the parameters of the birth-death-sampling process (or epidemiological parameters in the case of a phylogeny of a rapidly evolving infectious disease).
In summary, I have developed a series of new phylogenetic methods based on Bayesian model selection and Bayesian nonparametrics to permit the direct inference of the across-site variation of the nucleotide substitution process and the acrosstime variation in the birth-death-sampling process. These new methods take into account the uncertainty associated with the alignment partition scheme and the tree generating process, avoiding potential model misspecification.
This record has no associated files available for download.
More information
Published date: November 2014
Identifiers
Local EPrints ID: 437651
URI: http://eprints.soton.ac.uk/id/eprint/437651
PURE UUID: e1d3ac0d-b584-475e-a866-e3cc70d9ae02
Catalogue record
Date deposited: 10 Feb 2020 17:30
Last modified: 17 Mar 2024 04:00
Export record
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics