Download presentation
Presentation is loading. Please wait.
Published byBarbara Hood Modified over 8 years ago
1
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location and timing of expression of each transcript The protein produced from each transcript The location and timing of each protein’s expression The complete structure of each protein The functions of each protein
5
GOALS AND STRATEGIES Coverage of Fold Space Discover new protein folds Strategy: Targest are selected for which no protein fold assignment can be predicted by current methods. These targets are sought initially in the genomes of our target model organisms. Problem targets are salvaged through the use of orthologous targets from bacteria and archaea ("Bacterialization") or from yeast ("Yeastization"). Populate protein families' (Pfam) structural coverage Strategy: Representatives of protein families with limited or no structural coverage are selected as target. When solved, these structures can subsequently be used as templates to model other members of the family.
6
Additionally, targets that contain intrinsically disordered regions are removed and filtering parameters experimentally derived from the analysis of the pipeline flow are applied, with the expectation that such targets will give the best probability for success. In both cases, the attempt is to identify open reading frames that are likely to specify unknown folds. The total target selection process occurs in several stages. The protocol involves filtering for sequences that have no obvious homology to known structures Within this population, there is an attempt to identify sequences that are likely to have a unique structure that will be soluble in aqueous solution, by removing targets with predicted transmembrane helices, long coiled-coils, or signal peptides.
7
Target Strategy Target Strategy http://www.jcsg.org/scripts/prod/TargetStrategy1.html
8
Pfam is a collection of protein families and domains. Pfam contains multiple protein alignments and profile-HMMs of these families. Pfam is a semi-automatic protein family database, which aims to be comprehensive as well as accurate. This page provides links to various help documents that are available. About Pfam http://www.sanger.ac.uk/Software/Pfam /
9
What is Pfam ? uses Domains can be considered as building blocks of proteins. Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function. The presence of a particular domain can be indicative of the function of the protein. Pfam is a domain database. Comprised of two parts – Pfam-A and Pfam-B. Pfam is use by many different groups in many different ways. Originally set up to aid the annotation the C. elegans genomes.
10
The PFAM Database Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. For each family in Pfam you can: Look at multiple alignments View protein domain architectures Examine species distribution Follow links to other databases View known protein structures Search with Hidden Markov Model (HMM) for each alignment
11
The PFAM Database Pfam is a database of two parts, the first is the curated part of Pfam containing over 5193 protein families (Pfam-A). Pfam-A comprises manually crafted multiple alignments and profile-HMMs. To give Pfam a more comprehensive coverage of known proteins we automatically generate a supplement called Pfam-B. This contains a large number of small families taken from the PRODOM database that do not overlap with Pfam- A. Although of lower quality Pfam-B families can be useful when no Pfam-A families are found.
12
The PFAM Database Sequence coverage Pfam-A : 75% (Gr) Sequence coverage Pfam-B : 19% (Bl) Other (Grey)
13
Pfam is a collection of protein families and domains. Pfam contains multiple protein alignments and profile-HMMs of these families. Pfam is a semi-automatic protein family database, which aims to be comprehensive as well as accurate. This page provides links to various help documents that are available. About Pfam http://www.sanger.ac.uk/Software/Pfam /
14
ProDom is a comprehensive database of protein domain families generated from the global comparison of all available protein sequences. ProDom
15
A domain is a: Compact, semi-independent unit (Richardson, 1981). Stable unit of a protein structure that can fold autonomously (Wetlaufer, 1973). Recurring functional and evolutionary module (Bork, 1992).
16
Identification of domains is essential for: High resolution structures Sequence analysis Multiple alignment methods Sequence database searches Prediction algorithms Fold recognition Structural/functional genomics
17
Domain size The size of individual structural domains varies widely from 36 residues in E-selectin to 692 residues in lipoxygenase-1 (Jones et al., 1998), the majority (90%) having less than 200 residues (Siddiqui and Barton, 1995) with an average of about 100 residues (Islam et al., 1995). Small domains (less than 40 residues) are often stabilised by metal ions or disulphide bonds. Large domains (greater than 300 residues) are likely to consist of multiple hydrophobic cores (Garel, 1992).
19
Domain characteristics Domains are genetically mobile units, and multidomain families are found in all three kingdoms (Archaea, Bacteria and Eukarya) underlining the finding that ‘Nature is a tinkerer and not an inventor’ (Jacob, 1977). The majority of genomic proteins, 75% in unicellular organisms and more than 80% in metazoa, are multidomain proteins created as a result of gene duplication events (Apic et al., 2001). Domains in multidomain structures are likely to have once existed as independent proteins, and many domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes (Davidson et al., 1993).
20
ProDom is a comprehensive database of protein domain families generated from the global comparison of all available protein sequences. ProDom http://protein.toulouse.inra.fr/prodom/current/html/home.php
22
JCSG BSCG http://www.strgen.org http://www.jcsg.org SPINE http://www.spineurope.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.