What can (many) sequences tell us?. Nuclear receptor function.

Slides:



Advertisements
Similar presentations
# 1 The application of computational drug design to real life problems Jan Kelder Molecular Design & Informatics N.V. Organon Bioinformatics IV CMBI Nijmegen:
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Nuclear receptor function
©CMBI 2003 MUTANT DESIGN BIO- INFORMATICS QUESTION ‘MOLECULAR BIOLOGY’ BIOPHYSICS.
Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough.
Picormatics Today’s goal: Give you an overview of some recent technological bioinformatics developments that can be applied to picornaviruses. Where possible.
©CMBI 2006 Molecular motors At the organism – population level, motors are needed to transfer materials. At the organelle – organ level, motors are needed.
Structural bioinformatics
©CMBI 2003 MUTANT DESIGN BIO- INFORMATICS QUESTION ‘MOLECULAR BIOLOGY’ BIOPHYSICS G Vriend CMBI KUN Nijmegen Netherlands
What can (many) sequences tell us?
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Sequence similarity.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Sequencing a genome and Basic Sequence Alignment
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Protein Tertiary Structure Prediction
1. A project of David Lutje Hulsik and Tim Hulsen May 7th,
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Hidden Markov Models for Sequence Analysis 4
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
©CMBI 2003 MUTANT DESIGN BIO- INFORMATICS QUESTION ‘MOLECULAR BIOLOGY’ BIOPHYSICS.
Sequencing a genome and Basic Sequence Alignment
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Manually Adjusting Multiple Alignments Chris Wilton.
Homology modeling with SWISS-MODEL
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
3DM: Protein Super-family Platforms 3DM Protein super-family data integration Tom van den Bergh Bio-Prodict.
3DM: Protein engineering Super-family platforms Bio-Prodict DM super-family systems Henk-Jan Joosten Remko Kuipers Tom v/d Bergh Bas Vroling.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
What is phage display? An in vitro selection technique using a peptide or protein genetically fused to the coat protein of a bacteriophage.
Protein families, domains and motifs in functional prediction May 31, 2016.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Volume 19, Issue 8, Pages (August 2011)
Amino‐acid sequence of the heterogeneous nuclear RNP G protein family and RNA‐binding SELEX consensus sequence obtained for human RBMY. (A) Sequence alignment.
Nuclear Receptor structures
Mutations in the Transcription Factor Gene SOX18 Underlie Recessive and Dominant Forms of Hypotrichosis-Lymphedema-Telangiectasia  Alexandre Irrthum,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Kristopher Josephson, Naomi J. Logsdon, Mark R. Walter  Immunity 
Volume 116, Issue 3, Pages (February 2004)
Volume 9, Issue 2, Pages (February 2002)
Protein Sequence Analysis - Overview -
Regulation of AMPA Receptor Gating by Ligand Binding Core Dimers
Structure of the Angiopoietin-2 Receptor Binding Domain and Identification of Surfaces Involved in Tie2 Recognition  William A. Barton, Dorothea Tzvetkova,
The Nuclear Xenobiotic Receptor CAR
Crystal Structure of the Human High-Affinity IgE Receptor
Selective Dimerization of a C2H2 Zinc Finger Subfamily
Structural Insights into the Inhibition of Wnt Signaling by Cancer Antigen 5T4/Wnt- Activated Inhibitory Factor 1  Yuguang Zhao, Tomas Malinauskas, Karl.
The Crystal Structure of the Costimulatory OX40-OX40L Complex
Volume 21, Issue 6, Pages (June 2013)
Structure, Exchange Determinants, and Family-Wide Rab Specificity of the Tandem Helical Bundle and Vps9 Domains of Rabex-5  Anna Delprato, Eric Merithew,
Volume 19, Issue 8, Pages (August 2011)
Crystal Structure of a Phosphoinositide Phosphatase, MTMR2
Cell Signalling: Receptor orphans find a family
L. Aravind, Eugene V. Koonin  Current Biology 
Volume 24, Issue 7, Pages (July 2016)
Volume 5, Issue 10, Pages (October 1997)
Volume 2, Issue 1, Pages 1-4 (January 1994)
Volume 27, Issue 7, Pages e5 (July 2019)
Kristopher Josephson, Naomi J. Logsdon, Mark R. Walter  Immunity 
Crystal Structure of the Human Neuropilin-1 b1 Domain
Volume 7, Issue 6, Pages (June 2001)
Volume 9, Issue 2, Pages (February 2002)
Presentation transcript:

What can (many) sequences tell us?

Nuclear receptor function

Nuclear receptor family NR1C1-PPAR NR1C2-PPAS NR1C3-PPAT NR1D1-EAR1 NR1D2-BD73 NR1I3-MB67 NR1I4-CAR1-MOUSE- NR1H2-NER NR1H3-LXR NR1H4-FAR NR4A2-NOT NR4A3-NOR1 NR4A1-NGFI NR2F1-COTF NR2F2-ARP1 NR2F6-EAR2 NR2E3-PNR NR2B1-RRXANR2B2-RRXB NR2A2-HN4G NR3C1-GCR NR3C4-ANDR NR3C3-PRGR NR3A1-ESTR NR3A2-ERBT NR3B1-ERR1 NR3B2-ERR2 NR5A1-SF1 NR5A2-FTF NR1I1-VDR NR1B3-RRG1 NR2E1-TLX NR2C1-TR2-11 NR2C2-TR4 NR6A1-GCNF NR2B3-RRXG NR2A1-HNF4 NR2A5-HN4 d? NR0B1-DAX1 NR0B2-SHP NR3C2-MCR NR1F3-RORG NR1F2-RORB NR1F1-ROR1 NR1A2-THB1 NR1A1-THA1NR1I2-PXR NR1B2-RRB2 NR1B1-RRA1

Nuclear receptor structure A-BCDEF Ligand binding domain – conserved protein fold – > 20% sequence similarity DNA binding domain – highly conserved – > 90% similarity C E AF-1DNALBD

The questions How do ligands relate to activity? What is the role of each amino acid in the NR LBD? Which data handling / bioinformatics is needed to answer these questions?

3D structure LBD (hER  )

Available NR data 56 structures in (PDB) (>200 now*) >500 sequences (scattered) (>1500 now) >1000 mutations (very scattered) >10000 ligand-binding studies (secret) Disease patterns, expression, >1000 SNPs, genetic localization, etc., etc., etc. This data must be integrated, sorted, combined, validated, understood, and used to answer our questions. Now was in 2007…

Step 1 The first important step is a common numbering scheme because all structures have different numbering schemes, and there are insertions and deletions between species that are confusing any numbering. Whoever solves that problem once and for all should get three Nobel prices.

Large data volumes Large data volumes allow us to develop new data analysis techniques. Entropy-variability analysis is a novel technique to look at very large multiple sequence alignments. Entropy-variability analysis requires ‘better’ alignments than routinely are obtained with ‘standard’ multiple sequence alignment programs.

Part of the big alignment We see correlations between columns and between ‘things’.

Vriend’s first rule of sequence analysis If it is conserved, it is important

Vriend’s second rule of sequence analysis If it is very conserved, it is very important

Consequence: If something is conserved in each sub-family, it is involved in a sub-family specific function.

QWERTYASDFGRGH QWERTYASDTHRPM QWERTNMKDFGRKC QWERTNMKDTHRVW Red = conserved Green = variable Blue = correlated Example: (chymo)trypsin What is CMA? Functions never is just one residue

Correlations Residues can correlate with residues, and when that happens we found a function, no matter the conservation or variability. Residues that have a function, correlate with that function.

Correlations with wavelength Residues can also correlate with something else. Example: optimal wavelength for opsin excitation. WavelengthLoop1Loop2 UVGlnHis BlueAsnGln Red/GreenLeuGln

Wilma Wilma Kuipers Thesis Correlations with drug binding (so no longer evolution-based…)

Correlation analysis Receptor Affinity res. 386NNNNTTTTAAAVVLLNNNYYYYTT... 1 =5HT-1a 2 =5HT-1b 3 =5HT-1d.... Correlate sequences with ligand binding affinities Alignments showed 100% correlation of affinity for pindolol and the absence/presence of Asn386 Obviously, Asn386 plays an important role in ligand binding

Wilma Kuipers Thesis Wilma

Wilma Kuipers Thesis Wilma Summary correlation If its conserved its important; if its important it remains conserved. If residue positions show correlation with ‘something’ it is involved in that ‘something’. ‘Something’ can be any of a very large number of functions.

Wilma Kuipers Thesis Wilma Example correlation: Which cysteines form a pair in this protein family? Shown are aligned peptides from five different bacteria. ASDFGCHIKLMCNPQRSCTVW YSDYGCNIKLFCQPQRSCT-- ATDYPVQIKLMCNPQKSCSMW YTDFGCHVKLLVQPNRSVTVW -TDFGVHVKLMCNPQKSCSFW

Wilma Conserved or very conserved? Recalcitrant. ASDFGCHIKLMCNPQRSCTVW YSDYGCNIKLFCNPQRSCT-- ATDLPVQIKLMANPQKSCSVW LSDFGCHIKLMCNPQRSCTVW YTDFGCHVKLLVQPNRSVAFW -SDAGVHVKLMVQPNKSVSF- YTDFGCHVKLLVQPNRSVVFW -TDSGVHVKLMIQPNKSVSFW

Conclusion from recalcitrance The more exceptions you find in other (homologous) families, the less important is the residue in your family.

Entropy and variability So far we saw that conservation and correlation can help us find functionally important residues. Can variability patterns also tell us something?

Entropy 20 E i =  p i ln(p i ) i=1 Sequence entropy E i at position i is calculated from the frequency p i of the twenty amino acid types (p) at position i:

Variability Sequence variability V i is the number of amino acid types observed at position i in more than 0.5% of all sequences.

Intermezzo It is a common concept in bioinformatics to create an hypothesis. But……, every hypothesis must be tested against real data from real experiments.

Ras Entropy-Variability 11 Red 12 Orange 22 Yellow 23 Green 33 Blue

GPCR Entropy-Variability; signalling path GPCR 11 G protein 12 Support 22 Signaling 23 Ligand in 33 Ligand out

main function 12 first shell around main function 22 core residues (signal transduction) 23 modulator 33 mainly surface NR LBD Entropy-Variability

Example: role of Asp 351 EV ánd correlation. But the correlation would never have been found from sequence analyses. antagonist agonist

Summary variability analysis Variability patterns hold information. Entropy and Variability are two (of the) ways to measure variability patterns. Entropy and Variability patterns can say something about the type of function, and thus add detail to correlation studies.

Conclusions: Data is difficult, but we need it (sic); life would be so nice if we could do without it. PDB files are the worst. Nomenclature is not homogeneous. Ontologies…. Much data has been carefully hidden in the literature, where it can only be found back with great difficulty. Residue numbering is difficult but very necessary. Variability-entropy analysis is powerful, but requires very 'good' alignments.

A short break for a word from our sponsors Laerte Oliveira Our industrial sponsor: FLORENCEFLORENCE HORNHORN Wilma KuipersWeesp Bob Bywater Copenhagen Nora vd WendenThe Hague Mike SingerNew Haven Ad IJzermanLeiden Margot BeukersLeiden Fabien CampagneNew York Øyvind EdvardsenTroms Ø Simon FolkertsmaFrisia Henk-Jan JoostenWageningen Joost van DurmaBrussels David Lutje HulsikUtrecht Tim HulsenGoffert Manu BettlerLyon Elmar Krieger Simon Folkertsma David Tim AdjeMargot Fabien Manu