Data from: The architecture of an empirical genotype-phenotype map
Data from: The architecture of an empirical genotype-phenotype map
Recent advances in high-throughput technologies are bringing the study of empirical genotype-phenotype (GP) maps to the fore. Here, we use data from protein binding microarrays to study an empirical GP map of transcription factor (TF) binding preferences. In this map, each genotype is a DNA sequence. The phenotype of this DNA sequence is its ability to bind one or more TFs. We study this GP map using genotype networks, in which nodes represent genotypes with the same phenotype, and edges connect nodes if their genotypes differ by a single small mutation. We describe the structure and arrangement of genotype networks within the space of all possible binding sites for 525 TFs from three eukaryotic species encompassing three kingdoms of life (animal, plant, and fungi). We thus provide a high-resolution depiction of the architecture of an empirical GP map. Among a number of findings, we show that these genotype networks are “small-world” and assortative, and that they ubiquitously overlap and interface with one another. We also use polymorphism data from Arabidopsis thaliana to show how genotype network structure influences the evolution of TF binding sites in vivo. We discuss our findings in the context of regulatory evolution.,The architecture of an empirical genotype-phenotype mapThis DRYAD package contains files from: Aguilar-Rodríguez, J., Peel, L., Stella, M., Wagner, A., and Payne, J. L. The architecture of an empirical genotype-phenotype map. This package contains the network files in GML format for the genotype space of transcription factor (TF) binding sites ('genotype_space.gml'), 525 genotype networks of TF binding sites, and 66 genotype networks of DNA binding domains. The genotype networks of TF binding sites are classified in three directories according to their species provenance ('Arabidopsis_thaliana', 'Mus_musculus,' and 'Neurospora_crassa'). Each network file is named with the TF name. More information about these networks can be found in Table S1. The genotype networks of DNA binding domains are within a 'domains' sub-folder that can be found inside each of the three species folders. Each file is named with the DNA binding domain class. Each network file has the following vertex attributes: - id: vertex identification number. - sequence: the nucleotide sequence of the binding site. - reversecomplement: the reverse complement of 'sequence.' Genotype network of TF binding sites have the following additional vertex attributes: - Escore: the enrichment score in protein binding microarrays of the sequence. - PartitionSBM: Information about the stochastic block model partition group where the vertex is found: '0', '1', or 'None'. 'None' is for vertices not found in the dominant genotype network. - PartitionBA: Information about the binding affinity partition group where the vertex is found: '0', '1', or 'None'. 'None' is for vertices not found in the dominant genotype network. For questions regarding these data, contact Joshua Payne at joshua.payne@env.ethz.ch or Andreas Wagner at andreas.wagner@ieu.uzh.ch.dryad.zip
Phenotypic Plasticity, Molecular Evolution, Adaptation, Mutations
Aguilar-Rodriguez, Jose
120e28d4-bb3a-4a22-9e74-a13750b802e9
Peel, Leto
502a7ee9-369e-4b4e-8a75-d1e8d97896e1
Stella, Massimo
37822c93-2522-4bc0-b840-ca32c75efbd7
Wagner, Andreas
e80cb93f-a8d7-44f9-b013-2c5621430124
Payne, Joshua L.
4d990a3c-504b-4a15-936a-9fddcd105467
Aguilar-Rodriguez, Jose
120e28d4-bb3a-4a22-9e74-a13750b802e9
Peel, Leto
502a7ee9-369e-4b4e-8a75-d1e8d97896e1
Stella, Massimo
37822c93-2522-4bc0-b840-ca32c75efbd7
Wagner, Andreas
e80cb93f-a8d7-44f9-b013-2c5621430124
Payne, Joshua L.
4d990a3c-504b-4a15-936a-9fddcd105467
Abstract
Recent advances in high-throughput technologies are bringing the study of empirical genotype-phenotype (GP) maps to the fore. Here, we use data from protein binding microarrays to study an empirical GP map of transcription factor (TF) binding preferences. In this map, each genotype is a DNA sequence. The phenotype of this DNA sequence is its ability to bind one or more TFs. We study this GP map using genotype networks, in which nodes represent genotypes with the same phenotype, and edges connect nodes if their genotypes differ by a single small mutation. We describe the structure and arrangement of genotype networks within the space of all possible binding sites for 525 TFs from three eukaryotic species encompassing three kingdoms of life (animal, plant, and fungi). We thus provide a high-resolution depiction of the architecture of an empirical GP map. Among a number of findings, we show that these genotype networks are “small-world” and assortative, and that they ubiquitously overlap and interface with one another. We also use polymorphism data from Arabidopsis thaliana to show how genotype network structure influences the evolution of TF binding sites in vivo. We discuss our findings in the context of regulatory evolution.,The architecture of an empirical genotype-phenotype mapThis DRYAD package contains files from: Aguilar-Rodríguez, J., Peel, L., Stella, M., Wagner, A., and Payne, J. L. The architecture of an empirical genotype-phenotype map. This package contains the network files in GML format for the genotype space of transcription factor (TF) binding sites ('genotype_space.gml'), 525 genotype networks of TF binding sites, and 66 genotype networks of DNA binding domains. The genotype networks of TF binding sites are classified in three directories according to their species provenance ('Arabidopsis_thaliana', 'Mus_musculus,' and 'Neurospora_crassa'). Each network file is named with the TF name. More information about these networks can be found in Table S1. The genotype networks of DNA binding domains are within a 'domains' sub-folder that can be found inside each of the three species folders. Each file is named with the DNA binding domain class. Each network file has the following vertex attributes: - id: vertex identification number. - sequence: the nucleotide sequence of the binding site. - reversecomplement: the reverse complement of 'sequence.' Genotype network of TF binding sites have the following additional vertex attributes: - Escore: the enrichment score in protein binding microarrays of the sequence. - PartitionSBM: Information about the stochastic block model partition group where the vertex is found: '0', '1', or 'None'. 'None' is for vertices not found in the dominant genotype network. - PartitionBA: Information about the binding affinity partition group where the vertex is found: '0', '1', or 'None'. 'None' is for vertices not found in the dominant genotype network. For questions regarding these data, contact Joshua Payne at joshua.payne@env.ethz.ch or Andreas Wagner at andreas.wagner@ieu.uzh.ch.dryad.zip
This record has no associated files available for download.
More information
Published date: 6 April 2018
Keywords:
Phenotypic Plasticity, Molecular Evolution, Adaptation, Mutations
Identifiers
Local EPrints ID: 436506
URI: http://eprints.soton.ac.uk/id/eprint/436506
PURE UUID: 7c51e6ab-b7b4-4dc1-8dc7-23cfaf8b17d7
Catalogue record
Date deposited: 11 Dec 2019 17:31
Last modified: 25 Jul 2023 16:48
Export record
Altmetrics
Contributors
Contributor:
Jose Aguilar-Rodriguez
Contributor:
Leto Peel
Contributor:
Massimo Stella
Contributor:
Andreas Wagner
Contributor:
Joshua L. Payne
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics