The University of Southampton
University of Southampton Institutional Repository

Tandem repeat copy-number variation in protein-coding regions of human genes

Tandem repeat copy-number variation in protein-coding regions of human genes
Tandem repeat copy-number variation in protein-coding regions of human genes
BACKGROUND: Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

RESULTS: Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

CONCLUSION: Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.
1465-6906
R69
O'Dushlaine, Colm T.
e05a9968-44f7-41f5-841e-977e11971c88
Edwards, Richard
9d25e74f-dc0d-455a-832c-5f363d864c43
Park, Stephen D.
8f73ea0b-f431-4a5f-9b04-438afef63831
Shields, Denis C.
57ffee4f-0277-4b3d-9c7a-8c328637d8e6
O'Dushlaine, Colm T.
e05a9968-44f7-41f5-841e-977e11971c88
Edwards, Richard
9d25e74f-dc0d-455a-832c-5f363d864c43
Park, Stephen D.
8f73ea0b-f431-4a5f-9b04-438afef63831
Shields, Denis C.
57ffee4f-0277-4b3d-9c7a-8c328637d8e6

O'Dushlaine, Colm T., Edwards, Richard, Park, Stephen D. and Shields, Denis C. (2005) Tandem repeat copy-number variation in protein-coding regions of human genes. Genome Biology, 6 (8), R69. (doi:10.1186/gb-2005-6-8-r69).

Record type: Article

Abstract

BACKGROUND: Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

RESULTS: Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

CONCLUSION: Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.

Text
gb-2005-6-8-r69.pdf - Version of Record
Download (278kB)

More information

Published date: 2005

Identifiers

Local EPrints ID: 151149
URI: http://eprints.soton.ac.uk/id/eprint/151149
ISSN: 1465-6906
PURE UUID: 21a2492d-f335-4c87-bd69-fa908706450f

Catalogue record

Date deposited: 28 Jun 2010 14:43
Last modified: 14 Mar 2024 01:20

Export record

Altmetrics

Contributors

Author: Colm T. O'Dushlaine
Author: Richard Edwards
Author: Stephen D. Park
Author: Denis C. Shields

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×