The University of Southampton
University of Southampton Institutional Repository

Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN

Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN
Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN
The classification of high-throughput sequencing data of protein-encoding genes is not as well established as for 16S rRNA. The objective of this work was to develop a simple and accurate method of classifying large datasets of pmoA sequences, a common marker for methanotrophic bacteria. A taxonomic system for pmoA was developed based on a phylogenetic analysis of available sequences. The taxonomy incorporates the known diversity of pmoA present in public databases, including both sequences from cultivated and uncultivated organisms. Representative sequences from closely related genes, such as those encoding the bacterial ammonia monooxygenase, were also included in the pmoA taxonomy. In total, 53 low-level taxa (genus-level) are included. Using previously published datasets of high-throughput pmoA amplicon sequence data, we tested two approaches for classifying pmoA: a naïve Bayesian classifier and BLAST. Classification of pmoA sequences based on BLAST analyses was performed using the lowest common ancestor (LCA) algorithm in MEGAN, a software program commonly used for the analysis of metagenomic data. Both the naïve Bayesian and BLAST methods were able to classify pmoA sequences and provided similar classifications; however, the naïve Bayesian classifier was prone to misclassifying contaminant sequences present in the datasets. Another advantage of the BLAST/LCA method was that it provided a user-interpretable output and enabled novelty detection at various levels, from highly divergent pmoA sequences to genus-level novelty.
1664-302X
1-11
Dumont, Marc
afd9f08f-bdbb-4cee-b792-1a7f000ee511
Lüke, Claudia
cfc46055-b4be-4ea2-9b8b-67393177a516
Deng, Yongcui
5823c5fe-5a0d-4002-972c-48882272a423
Frenzel, Peter
54b1acd8-2094-4b93-811c-2c62812e0b87
Dumont, Marc
afd9f08f-bdbb-4cee-b792-1a7f000ee511
Lüke, Claudia
cfc46055-b4be-4ea2-9b8b-67393177a516
Deng, Yongcui
5823c5fe-5a0d-4002-972c-48882272a423
Frenzel, Peter
54b1acd8-2094-4b93-811c-2c62812e0b87

Dumont, Marc, Lüke, Claudia, Deng, Yongcui and Frenzel, Peter (2014) Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN. Frontiers in Microbiology, 5 (34), 1-11. (doi:10.3389/fmicb.2014.00034).

Record type: Article

Abstract

The classification of high-throughput sequencing data of protein-encoding genes is not as well established as for 16S rRNA. The objective of this work was to develop a simple and accurate method of classifying large datasets of pmoA sequences, a common marker for methanotrophic bacteria. A taxonomic system for pmoA was developed based on a phylogenetic analysis of available sequences. The taxonomy incorporates the known diversity of pmoA present in public databases, including both sequences from cultivated and uncultivated organisms. Representative sequences from closely related genes, such as those encoding the bacterial ammonia monooxygenase, were also included in the pmoA taxonomy. In total, 53 low-level taxa (genus-level) are included. Using previously published datasets of high-throughput pmoA amplicon sequence data, we tested two approaches for classifying pmoA: a naïve Bayesian classifier and BLAST. Classification of pmoA sequences based on BLAST analyses was performed using the lowest common ancestor (LCA) algorithm in MEGAN, a software program commonly used for the analysis of metagenomic data. Both the naïve Bayesian and BLAST methods were able to classify pmoA sequences and provided similar classifications; however, the naïve Bayesian classifier was prone to misclassifying contaminant sequences present in the datasets. Another advantage of the BLAST/LCA method was that it provided a user-interpretable output and enabled novelty detection at various levels, from highly divergent pmoA sequences to genus-level novelty.

Text
dumont-et-al-2014.pdf - Version of Record
Available under License Other.
Download (1MB)

More information

Published date: 18 February 2014
Organisations: Centre for Biological Sciences, Environmental

Identifiers

Local EPrints ID: 387930
URI: http://eprints.soton.ac.uk/id/eprint/387930
ISSN: 1664-302X
PURE UUID: f91fb53d-f21d-4447-a19b-9e3f614d8c8b
ORCID for Marc Dumont: ORCID iD orcid.org/0000-0002-7347-8668

Catalogue record

Date deposited: 09 Jun 2016 15:55
Last modified: 26 Nov 2019 01:32

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×