An Information-Theoretic Definition of Cell Type
An Information-Theoretic Definition of Cell Type
Individual cells are often classified into cell ‘types’ based on the expression of so-called marker genes. Such marker-based classification assumes that cells of a given type are (at least approximately) interchangeable with respect to the expression of their associated markers. This traditional approach to cellular classification has been disrupted by single-cell RNA-sequencing technologies, which are able to measure genome-wide gene expression across thousands of individual cells. While potentially providing a wealth of data for cellular classification, these technologies have revealed that cells ostensibly of the same type are often highly heterogeneous (i.e. not interchangeable) with respect to the expression of established marker genes.
A myriad of single-cell clustering methods has recently been developed to overcome the issue of heterogeneity with respect to marker gene expression and identify cell types directly from single-cell expression data. These methods typically proceed via: (1) unsupervised identification of clusters from single-cell expression data sets; (2) mapping of identified clusters to known cell types based on the expression of previously established marker genes. However, this two-step cluster-based approach to cellular classification is less biologically intuitive than the traditional marker-based approach, involving substantial mathematical and biological assumptions regarding the nature of cell type.
In this thesis, I formalise the traditional marker gene approach to cellular classification using notions from information theory, and show how this formalism can be applied to identifying cell types from single-cell RNA-sequencing data. Specifically, I develop a novel clustering method based on the assumption that cells of the same type should be minimally heterogeneous – i.e. approximately interchangeable – with respect to the measured expression of a set of genes. Thus, this work offers an intuitive, formal definition of cell type that unites the traditional and current approaches to cellular classification through the mathematics of information theory.
University of Southampton
Casey, Michael, John
3f316614-e401-4955-b400-0815e03af431
Casey, Michael, John
3f316614-e401-4955-b400-0815e03af431
Macarthur, Benjamin
2c0476e7-5d3e-4064-81bb-104e8e88bb6b
Casey, Michael, John
(2021)
An Information-Theoretic Definition of Cell Type.
University of Southampton, Doctoral Thesis, 166pp.
Record type:
Thesis
(Doctoral)
Abstract
Individual cells are often classified into cell ‘types’ based on the expression of so-called marker genes. Such marker-based classification assumes that cells of a given type are (at least approximately) interchangeable with respect to the expression of their associated markers. This traditional approach to cellular classification has been disrupted by single-cell RNA-sequencing technologies, which are able to measure genome-wide gene expression across thousands of individual cells. While potentially providing a wealth of data for cellular classification, these technologies have revealed that cells ostensibly of the same type are often highly heterogeneous (i.e. not interchangeable) with respect to the expression of established marker genes.
A myriad of single-cell clustering methods has recently been developed to overcome the issue of heterogeneity with respect to marker gene expression and identify cell types directly from single-cell expression data. These methods typically proceed via: (1) unsupervised identification of clusters from single-cell expression data sets; (2) mapping of identified clusters to known cell types based on the expression of previously established marker genes. However, this two-step cluster-based approach to cellular classification is less biologically intuitive than the traditional marker-based approach, involving substantial mathematical and biological assumptions regarding the nature of cell type.
In this thesis, I formalise the traditional marker gene approach to cellular classification using notions from information theory, and show how this formalism can be applied to identifying cell types from single-cell RNA-sequencing data. Specifically, I develop a novel clustering method based on the assumption that cells of the same type should be minimally heterogeneous – i.e. approximately interchangeable – with respect to the measured expression of a set of genes. Thus, this work offers an intuitive, formal definition of cell type that unites the traditional and current approaches to cellular classification through the mathematics of information theory.
Text
Thesis
- Version of Record
Text
Permission to deposit thesis (signed)
- Version of Record
Restricted to Repository staff only
More information
Submitted date: October 2021
Identifiers
Local EPrints ID: 456818
URI: http://eprints.soton.ac.uk/id/eprint/456818
PURE UUID: 1e08cf57-7535-4abf-b5e9-b324f8fdb3bb
Catalogue record
Date deposited: 12 May 2022 16:33
Last modified: 17 Mar 2024 02:51
Export record
Contributors
Author:
Michael, John Casey
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics