Biology 224 Dr. Tom Peavy Sept 28 & 30

Slides:



Advertisements
Similar presentations
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Advertisements

Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Pfam(Protein families )
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
COG and GO tutorial.
Bioinformatics on Proteomics Hsueh-Fen Juan April 24, 2003 NTNU.
Genome analysis and annotation Part II. THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein analysis and proteomics Friday, 27 January 2006 Introduction to Bioinformatics DA McClellan
The Protein Data Bank (PDB)
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Protein Modules An Introduction to Bioinformatics.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Protein analysis and proteomics (Part 1 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E: J. Pevsner
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Day 2: Protein Sequence Analysis 1.Physico-chemical properties. 2.Cellular localization. 3.Signal peptides. 4.Transmembrane domains. 5.Post-translational.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Gene expression analysis
BIOINFORMATIK I UEBUNG 2 mRNA processing.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Proteomics The science of proteomics Applications of proteomics Proteomic methods a. protein purification b. protein sequencing c. mass spectrometry.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Protein and RNA Families
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Protein Domain Database
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Protein structure, domains, and interactions Curtis Huttenhower Harvard T.H. Chan School of Public Health Department of Biostatistics.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Protein families, domains and motifs in functional prediction May 31, 2016.
Protein analysis and proteomics
Protein families, domains and motifs in functional prediction
Protein databases Henrik Nielsen
Bio/Chem-informatics
Genome Annotation Continued
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Biology 224 Dr. Tom Peavy Sept 28 & 30 Protein Structure & Analysis Biology 224 Dr. Tom Peavy Sept 28 & 30 <Images from Bioinformatics and Functional Genomics by Jonathan Pevsner>

Protein families Protein localization protein Protein function Gene ontology (GO): --cellular component --biological process --molecular function Physical properties

The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) Work groups Gel Electrophoresis Mass Spectrometry Molecular Interactions Protein Modifications Proteomics Informatics Sample Processing Themes Controlled vocabularies MIAPE: Minimum information about a proteomics experiment

The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) http://www.psidev.info/

Protein domains, motifs & signatures

Definitions Signature: a protein category such as a domain or motif (a defining property of the protein or family) Domain: a region of a protein that can adopt a 3D structure a fold a family is a group of proteins that share a domain examples: zinc finger domain immunoglobulin domain Motif (or fingerprint): a short, conserved region of a protein typically 10 to 20 contiguous amino acid residues

Definition of a domain According to InterPro at EBI (http://www.ebi.ac.uk/interpro/): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities.

15 most common domains (human) Zn finger, C2H2 type 1093 proteins Immunoglobulin 1032 EGF-like 471 Zn-finger, RING 458 Homeobox 417 Pleckstrin-like 405 RNA-binding region RNP-1 400 SH3 394 Calcium-binding EF-hand 392 Fibronectin, type III 300 PDZ/DHR/GLGF 280 Small GTP-binding protein 261 BTB/POZ 236 bHLH 226 Cadherin 226

Varieties of protein domains Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times

Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2) MBD TRD The protein includes a methylated DNA binding domain (MBD) and a transcriptional repression domain (TRD). MeCP2 is a transcriptional repressor. Mutations in the gene encoding MeCP2 cause Rett Syndrome, a neurological disorder affecting girls primarily.

Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins

Are proteins that share only a domain homologous?

Proteins can have both domains and patterns (motifs) (several residues) Pattern (several residues) Domain (aspartyl protease) Domain (reverse transcriptase)

The SwissProt entry for any protein provides highly useful information…

SwissProt entry for HIV-1 pol links to many databases

Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. Profiles are found in Pfam, ProDom, SMART, and other databases. Page 231-233

http://www.ebi.ac.uk/Databases/ http://www.ebi.ac.uk/interpro/ InterPro        InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. http://www.ebi.ac.uk/interpro/                                ExPASy Proteomics Server The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE (Disclaimer / References). http://ca.expasy.org/

PROSITE Database of protein families and domains                                                          PROSITE Database of protein families and domains http://ca.expasy.org/prosite/           Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. http://www.sanger.ac.uk/Software/Pfam/index.shtml           PRINTS is a compendium of protein fingerprints http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/         The ProDom protein domain database consists of an automatic compilation of homologous domains. http://prodes.toulouse.inra.fr/prodom/current/html/home.php

ProDom entry for HIV-1 pol shows many related proteins Page 231

Houses the PIRSF, ProClass and ProLINK databases           SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. http://smart.embl-heidelberg.de/                                       Houses the PIRSF, ProClass and ProLINK databases http://pir.georgetown.edu/

www.uniprot.org Three protein databases recently merged to form UniProt: SwissProt TrEMBL (translated European Molecular Biology Lab) Protein Information Resource (PIR) You can search for information on your favorite protein there; a BLAST server is provided.

1. Go to ExPASy (http://www.expasy.ch/) 2. If you know the SwissProt accession of your protein, enter it at top. 3. Otherwise go into Swiss-Prot/TrEMBL, click SRS (Sequence Retrieval System), click Start, then click continue, then search for your protein of interest. Page 230

Protein family classification and databases PIRSF           PIRSF TIGRFAMs http://www.tigr.org/TIGRFAMs/index.shtml http://pir.georgetown.edu/iproclass/ Gene3D SUPERFAMILY http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/ PANTHER http://www.pantherdb.org/

Physical properties of proteins Many websites are available for the analysis of individual proteins. ExPASy and ISREC are two excellent resources. The accuracy of these programs is variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as posttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms. Page 236

http://www.expasy.ch/ Page 230

Access a variety of protein analysis programs from the top right of the ExPASy home page Page 235

Page 244

Page 244

Proteomics: High throughput protein analysis Proteomics is the study of the entire collection of proteins encoded by a genome “Proteomics” refers to all the proteins in a cell and/or all the proteins in an organism Large-scale protein analysis 2D protein gels Yeast two-hybrid Rosetta Stone approach Pathways Page 247

Two-dimensional protein gels First dimension: isoelectric focusing Second dimension: SDS-PAGE Page 248

Two-dimensional protein gels First dimension: isoelectric focusing Electrophorese ampholytes to establish a pH gradient Can use a pre-made strip Proteins migrate to their isoelectric point (pI) then stop (net charge is zero) Range of pI typically 4-9 (5-8 most common) Page 248

Two-dimensional protein gels Second dimension: SDS-PAGE Electrophorese proteins through an acrylamide matrix Proteins are charged and migrate through an electric field Conditions are denaturing (SDS) and reducing (2-mercaptoethanol) Can resolve hundreds to thousands of proteins Page 248

Proteins identified on 2D gels (IEF/SDS-PAGE) Direct protein microsequencing by Edman degradations -- done at many core facilities (e.g. UC Davis) -- typically need 5 picomoles -- often get 10 to 20 amino acids sequenced Protein mass analysis by MALDI-TOF -- done at core facilities -- often detect posttranslational modifications -- matrix assisted laser desorption/ionization time-of-flight spectroscopy Page 250-1

Page 252

Evaluation of 2D gels (IEF/SDS-PAGE) Advantages: Visualize hundreds to thousands of proteins Improved identification of protein spots Disadvantages: Limited number of samples can be processed Mostly abundant proteins visualized Technically difficult Page 251

Gene Ontology (GO) Consortium

The Gene Ontology Consortium An ontology is a description of concepts. The GO Consortium compiles a dynamic, controlled vocabulary of terms related to gene products. There are three organizing principles: Molecular function Biological process Cellular component

GO terms are assigned to Entrez Gene entries Page 241

Page 241

Example Gene product cytochrome c GO entry terms: molecular function = electron transporter activity, the biological process = oxidative phosphorylation and induction of cell death the cellular component = mitochondrial matrix and mitochondrial inner membrane.

GO consortium (http://www.geneontology.org) No centralized GO database. Instead, curators of organism-specific databases assign GO terms to gene products for each organism. AmiGO is the searchable portion of the GO --Gene Symbol, name, UniProt access numbers, and Text searches can be used to find GO entries

The Gene Ontology Consortium: Evidence Codes IC Inferred by curator IDA Inferred from direct assay IEA Inferred from electronic annotation IEP Inferred from expression pattern IGI Inferred from genetic interaction IMP Inferred from mutant phenotype IPI Inferred from physical interaction ISS Inferred from sequence or structural similarity NAS Non-traceable author statement ND No biological data TAS Traceable author statement