The University of Southampton
University of Southampton Institutional Repository

An information-theoretic approach to single cell sequencing analysis

An information-theoretic approach to single cell sequencing analysis
An information-theoretic approach to single cell sequencing analysis

Background: single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. 

Results: here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types. 

Conclusions: thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation.

1471-2105
Casey, Michael J.
3f316614-e401-4955-b400-0815e03af431
Fliege, Joerg
54978787-a271-4f70-8494-3c701c893d98
Sanchez-Garcia, Ruben J.
8246cea2-ae1c-44f2-94e9-bacc9371c3ed
Macarthur, Ben D.
2c0476e7-5d3e-4064-81bb-104e8e88bb6b
Casey, Michael J.
3f316614-e401-4955-b400-0815e03af431
Fliege, Joerg
54978787-a271-4f70-8494-3c701c893d98
Sanchez-Garcia, Ruben J.
8246cea2-ae1c-44f2-94e9-bacc9371c3ed
Macarthur, Ben D.
2c0476e7-5d3e-4064-81bb-104e8e88bb6b

Casey, Michael J., Fliege, Joerg, Sanchez-Garcia, Ruben J. and Macarthur, Ben D. (2023) An information-theoretic approach to single cell sequencing analysis. BMC Bioinformatics, 24 (1), [311]. (doi:10.1186/s12859-023-05424-8).

Record type: Article

Abstract

Background: single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. 

Results: here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types. 

Conclusions: thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation.

Text
s12859-023-05424-8 - Version of Record
Available under License Creative Commons Attribution.
Download (2MB)

More information

Accepted/In Press date: 18 July 2023
Published date: 12 August 2023
Additional Information: Funding Information: RSG and BDM were supported by The Alan Turing Institute under the EPSRC Grant EP/N510129/1. RSG was supported by the Alan Turing Institute—Roche strategic partnership under the project ‘Structured missingness in heterogeneous data’. Publisher Copyright: © 2023, BioMed Central Ltd., part of Springer Nature

Identifiers

Local EPrints ID: 482485
URI: http://eprints.soton.ac.uk/id/eprint/482485
ISSN: 1471-2105
PURE UUID: d7a6889e-ae7c-447f-aaa0-9768dc23e229
ORCID for Joerg Fliege: ORCID iD orcid.org/0000-0002-4459-5419
ORCID for Ruben J. Sanchez-Garcia: ORCID iD orcid.org/0000-0001-6479-3028
ORCID for Ben D. Macarthur: ORCID iD orcid.org/0000-0002-5396-9750

Catalogue record

Date deposited: 09 Oct 2023 16:42
Last modified: 18 Mar 2024 03:16

Export record

Altmetrics

Contributors

Author: Michael J. Casey
Author: Joerg Fliege ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×