Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN
Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN
The classification of high-throughput sequencing data of protein-encoding genes is not as well established as for 16S rRNA. The objective of this work was to develop a simple and accurate method of classifying large datasets of pmoA sequences, a common marker for methanotrophic bacteria. A taxonomic system for pmoA was developed based on a phylogenetic analysis of available sequences. The taxonomy incorporates the known diversity of pmoA present in public databases, including both sequences from cultivated and uncultivated organisms. Representative sequences from closely related genes, such as those encoding the bacterial ammonia monooxygenase, were also included in the pmoA taxonomy. In total, 53 low-level taxa (genus-level) are included. Using previously published datasets of high-throughput pmoA amplicon sequence data, we tested two approaches for classifying pmoA: a naïve Bayesian classifier and BLAST. Classification of pmoA sequences based on BLAST analyses was performed using the lowest common ancestor (LCA) algorithm in MEGAN, a software program commonly used for the analysis of metagenomic data. Both the naïve Bayesian and BLAST methods were able to classify pmoA sequences and provided similar classifications; however, the naïve Bayesian classifier was prone to misclassifying contaminant sequences present in the datasets. Another advantage of the BLAST/LCA method was that it provided a user-interpretable output and enabled novelty detection at various levels, from highly divergent pmoA sequences to genus-level novelty.
1-11
Dumont, Marc
afd9f08f-bdbb-4cee-b792-1a7f000ee511
Lüke, Claudia
cfc46055-b4be-4ea2-9b8b-67393177a516
Deng, Yongcui
5823c5fe-5a0d-4002-972c-48882272a423
Frenzel, Peter
54b1acd8-2094-4b93-811c-2c62812e0b87
18 February 2014
Dumont, Marc
afd9f08f-bdbb-4cee-b792-1a7f000ee511
Lüke, Claudia
cfc46055-b4be-4ea2-9b8b-67393177a516
Deng, Yongcui
5823c5fe-5a0d-4002-972c-48882272a423
Frenzel, Peter
54b1acd8-2094-4b93-811c-2c62812e0b87
Dumont, Marc, Lüke, Claudia, Deng, Yongcui and Frenzel, Peter
(2014)
Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN.
Frontiers in Microbiology, 5 (34), .
(doi:10.3389/fmicb.2014.00034).
Abstract
The classification of high-throughput sequencing data of protein-encoding genes is not as well established as for 16S rRNA. The objective of this work was to develop a simple and accurate method of classifying large datasets of pmoA sequences, a common marker for methanotrophic bacteria. A taxonomic system for pmoA was developed based on a phylogenetic analysis of available sequences. The taxonomy incorporates the known diversity of pmoA present in public databases, including both sequences from cultivated and uncultivated organisms. Representative sequences from closely related genes, such as those encoding the bacterial ammonia monooxygenase, were also included in the pmoA taxonomy. In total, 53 low-level taxa (genus-level) are included. Using previously published datasets of high-throughput pmoA amplicon sequence data, we tested two approaches for classifying pmoA: a naïve Bayesian classifier and BLAST. Classification of pmoA sequences based on BLAST analyses was performed using the lowest common ancestor (LCA) algorithm in MEGAN, a software program commonly used for the analysis of metagenomic data. Both the naïve Bayesian and BLAST methods were able to classify pmoA sequences and provided similar classifications; however, the naïve Bayesian classifier was prone to misclassifying contaminant sequences present in the datasets. Another advantage of the BLAST/LCA method was that it provided a user-interpretable output and enabled novelty detection at various levels, from highly divergent pmoA sequences to genus-level novelty.
Text
dumont-et-al-2014.pdf
- Version of Record
Available under License Other.
More information
Published date: 18 February 2014
Organisations:
Centre for Biological Sciences, Environmental
Identifiers
Local EPrints ID: 387930
URI: http://eprints.soton.ac.uk/id/eprint/387930
ISSN: 1664-302X
PURE UUID: f91fb53d-f21d-4447-a19b-9e3f614d8c8b
Catalogue record
Date deposited: 09 Jun 2016 15:55
Last modified: 15 Mar 2024 03:53
Export record
Altmetrics
Contributors
Author:
Claudia Lüke
Author:
Yongcui Deng
Author:
Peter Frenzel
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics