The University of Southampton
University of Southampton Institutional Repository

Bayesian approaches to model uncertainty in phylogenetics

Bayesian approaches to model uncertainty in phylogenetics
Bayesian approaches to model uncertainty in phylogenetics
When inferring a phylogeny in a probabilistic framework, one is faced with many choices of how to model the underlying processes that give rise to the observed data. This includes how to accommodate the across-site variation in the properties of the nucleotide substitution process and how to incorporate the across-time variation in the parameters of the tree-generating process.

To model across-site heterogeneity in the properties of the nucleotide substitution process, one commonly pre-defines the partition scheme of the alignment thereby grouping sites into a number of categories and then independently estimates the substitution model of each category. This practice ignores the uncertainty associated with the partition scheme, and the pre-defined partition scheme may not agree with the data.

This thesis first presents three new methods that accommodate the uncertainty associated with the partition scheme. They estimate the number of categories, the assignments of sites to the categories, the nucleotide substitution model and the site rate model for each category, and the uncertainty in these selections. These methods employ approaches of Bayesian model selection and/or Bayesian nonparametrics. They differ in the a priori assumptions on the assignments of sites to categories, and therefore provide different views on the across-site heterogeneity in the properties of the nucleotide substitution process. Analyses with all three methods have found statistical evidence for across-site heterogeneity in the nucleotide substitution process both within a single gene and among genes in various sets of empirical data.

Recently proposed models based on the birth-death-sampling process allow the rate parameters of birth, death and sampling events to vary through time as piecewise constant functions, but require the number of rate shifts to be fixed a priori. This thesis presents a new method that employs a transdimensional sampling algorithm for Bayesian model selection to directly estimate the number of shifts in the parameters of the birth-death-sampling process (or epidemiological parameters in the case of a phylogeny of a rapidly evolving infectious disease).

In summary, I have developed a series of new phylogenetic methods based on Bayesian model selection and Bayesian nonparametrics to permit the direct inference of the across-site variation of the nucleotide substitution process and the acrosstime variation in the birth-death-sampling process. These new methods take into account the uncertainty associated with the alignment partition scheme and the tree generating process, avoiding potential model misspecification.
University of Auckland
Wu, Chieh-Hsi
ace630c6-2095-4ade-b657-241692f6b4d3
Wu, Chieh-Hsi
ace630c6-2095-4ade-b657-241692f6b4d3

Wu, Chieh-Hsi (2014) Bayesian approaches to model uncertainty in phylogenetics. The University of Auckland, Doctoral Thesis, 234pp.

Record type: Thesis (Doctoral)

Abstract

When inferring a phylogeny in a probabilistic framework, one is faced with many choices of how to model the underlying processes that give rise to the observed data. This includes how to accommodate the across-site variation in the properties of the nucleotide substitution process and how to incorporate the across-time variation in the parameters of the tree-generating process.

To model across-site heterogeneity in the properties of the nucleotide substitution process, one commonly pre-defines the partition scheme of the alignment thereby grouping sites into a number of categories and then independently estimates the substitution model of each category. This practice ignores the uncertainty associated with the partition scheme, and the pre-defined partition scheme may not agree with the data.

This thesis first presents three new methods that accommodate the uncertainty associated with the partition scheme. They estimate the number of categories, the assignments of sites to the categories, the nucleotide substitution model and the site rate model for each category, and the uncertainty in these selections. These methods employ approaches of Bayesian model selection and/or Bayesian nonparametrics. They differ in the a priori assumptions on the assignments of sites to categories, and therefore provide different views on the across-site heterogeneity in the properties of the nucleotide substitution process. Analyses with all three methods have found statistical evidence for across-site heterogeneity in the nucleotide substitution process both within a single gene and among genes in various sets of empirical data.

Recently proposed models based on the birth-death-sampling process allow the rate parameters of birth, death and sampling events to vary through time as piecewise constant functions, but require the number of rate shifts to be fixed a priori. This thesis presents a new method that employs a transdimensional sampling algorithm for Bayesian model selection to directly estimate the number of shifts in the parameters of the birth-death-sampling process (or epidemiological parameters in the case of a phylogeny of a rapidly evolving infectious disease).

In summary, I have developed a series of new phylogenetic methods based on Bayesian model selection and Bayesian nonparametrics to permit the direct inference of the across-site variation of the nucleotide substitution process and the acrosstime variation in the birth-death-sampling process. These new methods take into account the uncertainty associated with the alignment partition scheme and the tree generating process, avoiding potential model misspecification.

This record has no associated files available for download.

More information

Published date: November 2014

Identifiers

Local EPrints ID: 437651
URI: http://eprints.soton.ac.uk/id/eprint/437651
PURE UUID: e1d3ac0d-b584-475e-a866-e3cc70d9ae02
ORCID for Chieh-Hsi Wu: ORCID iD orcid.org/0000-0001-9386-725X

Catalogue record

Date deposited: 10 Feb 2020 17:30
Last modified: 17 Mar 2024 04:00

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×