Metazoan Protein Data Page
Raw data were obtained from Ensembl (v.56) or EnsemblMetazoa (v.3). When length data were not available, protein lengths were calculated from the amino acid sequences.
"protein_data.zip" contents
- Source
code: Contains all source code used in the analysis of raw data.
- Length
data: Length data for 49 metazoan species as tab delimited text files. Each file contains data from an individual
species with gene name, length, and all associated domains.
- Domain
data: Domain data for 49 metazoan species as tab delimited text files. Each
file contains data from an individual species separated by proteins that
contain repeats and proteins that do not contain repeats.
- Function
data: GO term data for 49 metazoan species as tab delimited text files. Three master files are included
(biological process, cellular component, molecular function). For each GO term contains total
number of proteins, average length, and average number of domains
(includes of reach). Files
also include the fractional domain distribution for each term.
NOTE: Please be careful when using some of the data files with Microsoft Excel as they are large and can cause it to crash.
All source code is provided as free software for academic use. You can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation (version 3).
Report bugs to nayak@tcnj.edu