Structural Genomics of Pathogenic Protozoa Christopher Mehlin Protein Production and Crystallization Workshop 2004

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Pathogenic Protozoa. Trypanosoma brucei African Sleeping Sickness 60 M in sub-Saharan Africa K Uniformly fatal within 2 yrs Poor due to antigenic.
BLAST Sequence alignment, E-value & Extreme value distribution.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Structural bioinformatics
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Protein structure (Part 2 of 2).
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Understanding the mechanism of action of the Plasmodium falciparum lariat debranching enzyme BME230 Computational Genomics Maria Daleke
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein Production and Crystallization Workshop Structural Genomics of Pathogenic Protozoa (SGPP) Crystal Growth Lab Lori Anderson
Protein-Protein Interaction Screens. Bacterial Two-Hybrid System selectable marker RNA polymerase DNA binding protein bait target sequence target.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Gene Expression Analysis by SAGE and MPSS Amanda Sitterly.
Eastern Africa Barcode Workshop, Oct DNA Barcoding - Parasites and Vectors Dan Masiga Molecular Biology and Biotechnology Department.
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Topic 2 John Markley. Task: choice of targets that meet selection criteria and are likely to yield structures Models from sequences: ORFs, intron/exon.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
SGPPname Crystals seen?ListEnzyme name PfAl007201WESYesWesthioredoxin PfAl006677WES WesCholine kinase PfAl008572WES Wescyclophilin-like PfAl004964WES WesProtein.
Friday July 18 th Update “WesList” proteins Wes and Mark? identified a number of potential targets based upon the presence of patents – the enzyme name,
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
METHOD: Family Classification Scheme 1)Set for a model building: 67 microbial genomes with identified protein sequences (Table 1) 2)Set for a model.
Cryo-additives in crystal growth for optimal cryoing Jürgen Bosch.
Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio.
Can chunking with Ginzu improve soluble expression? Method 1.Ran Ginzu on ~1000 L major targets 2.Manually selected domain parses that looked good according.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Virginia Commonwealth University
Biotechnology.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Use of High Throughput Screening to Determine Lead Crystallization Conditions and as a Tool for Basic Research.
Protein Structure Prediction and Protein Homology modeling
Chapter 14 Bioinformatics—the study of a genome
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Target selection strategies for the mouse genome
Prediction of Protein Structure and Function on a Proteomic Scale
Identify D. melanogaster ortholog
Homology Modeling.
Protein structure prediction.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Structural Genomics of Pathogenic Protozoa Christopher Mehlin Protein Production and Crystallization Workshop

The SGPP is focused on protozoa which cause human disease Malaria – Plasmodium falciparum, P. vivax Leishmaniasis – Leishmania major + 8 others African sleeping sickness – Trypanosoma brucei Chagas’ disease – Trypanosoma cruzi These diseases afflict ~500 million people per year; roughly half the world’s population is at risk.

These targets are challenging! Eukaryotic organisms Leishmania –Only L. major sequence is known (more coming…) Plasmodium falciparum –80% AT-rich genome –Requires cDNA – intron prediction difficult –Floppy loops e.g. CDK-2 has 83 asparagines in a row

Primers-to-Protein Normally ~5% Overall Yield Data from 1318 L. major and 368 P. falciparum targets L. major 5.2% P. falciparum 4.9% >85% of our effort is put into cloning, screening, and expressing this 5%

Protein Variants Increase the Odds Multiple species variants –Especially Leishmania “Chunking” –Computational domain prediction –Random truncation

L. major L. aethiopica L. infantum L. donovani L. tropica L. mexicana L. guyanensis L. naiffi L. braziliensis L. tarentolae E. scheideri Homology 97% 60% Human pathogens Primers designed for L. major can fish out homologues from other species

L. major L. aethiopica L. infantum L. donovani L. tropica L. mexicana L. guyanensis L. naiffi L. braziliensis L. tarentolae E. scheideri Homology 97% 60% PCR success using L. major primers 83% 10% Primers designed for L. major can fish out homologues from other species

Multiple species targeted with a list of 40 high-value targets (enzymes with known inhibitors) P. falciparum4 L. major4 Organism Target Number Two species gave us eight proteins and 7/40 (18%) of the targets. HOMOLOGUES

Multiple species targeted with a list of 40 high-value targets (enzymes with known inhibitors) P. falciparum4 L. major4 L. infantum3 Organism Target Number 95% IDENTICAL No overlap! Small changes in sequence make an enormous difference in the behavior of the protein.

Multiple species targeted with a list of 40 high-value targets (enzymes with known inhibitors) P. falciparum4 L. major4 L. infantum3 L. mexicana3 L. guyanensis2 L. tarentole1 L. braziliensis2 Organism TOTAL: 19 proteins, 14 of 40 (35%) of targets 10 targets would not have been obtained otherwise Target Number

Multiple species variants help crystallization, too! 1 60 Lmaj MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHELVPENTKYIVTEHY Ldon MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHEVVPENTKYIVTEHY Lmaj PKGLGRIVPGITLPQTAHLIEKTRFSCIVPQVEELLEDVDNAVVFGIEGHACILQTVADL Ldon PKGLGRIVPEITLPKTAHLIEKTRFSCVVPQVEELLEDVDNAVVFGIEGHACILQTVADL Lmaj LDMNERVFLPKDGLGSQKKTDFKAAMKLMGSWSPNCEITTSESILLQMTKDAMDPDFKKI Ldon LDMNKRVFLPKDGLGSQKKTDFKAAIKLMSSWGPNCEITTSESILLQMTKDAMDPNFKRI Lmaj SKLLKEEPPIPL. Ldon SKLLKEEPPIPL. 95% IDENTITY Lmaj001686AAA nice crystals, no diffraction Ldon001686AAA “huge” crystals, 2.7Å diffraction

Consider a 3-domain protein: Standard chunks would be the entire protein, each individual domain, and any contiguous series of domains. A 3 domain protein therefore becomes 6 chunks. Full length Adjacent domains Single domains The concept of chunking… N(N+1) 2

Domain Parsing using GINZU Step 1: PSI-Blast against the PDB Step 2: Use consensus fold recognition methods to find remote PDB matches PDB Fold Recognition PDBFold Recognition Step 3: Search PFAM database for preassigned modular “chunks” Pfam Step 4: Identify new modular “chunk” regions in multiple sequence alignment PDBFold Recognition Pfam Final Step: Select cut points in linker regions using assigned boundaries and coil predictions MSA Target Sequence Confidence PDBPfamMSAFold Recognition PDBFold Recognition PfamMSA Step 5: Identify parse points in Rosetta structure predictions Rosetta Chunk Generation David Kim, UW

Pfal006650AAA Example - tRNA Synthetase PFAM, PDB, and MSA coverage Ginzu Domains 1.No assignment but still based on MSA (remaining region) 2.PFAM hit to PF01411 tRNA synthetases class II (A) 3.PDB hit to 1nyqA (Threonyl-tRNA Synthetase) 4.MSA based assignment Ginzu Parse Results w/ Multiple Sequence Alignment PSI-BLAST against Non-redundent (NR) sequence database PFAM PDBMSARemaining Region David Kim, UW

71 ORFs 12/66 inaccessible proteins have had at least one soluble chunk (18%) 17/71 proteins accessible via this technique (24%) CHUNKING L. major PROTEINS GINZU 205 Chunks (not counting full length) 5 ORFs solubly expressed (7%) 15 chunks solubly expressed (7%) 11 ORFs had 1 soluble chunk 2 ORFs had 2 chunks soluble 2/16 chunks of soluble ORFs soluble (both of the same ORF) 1 chunk of non-crystallizing, soluble ORF crystallized

Superchunking: for high-value targets Step 1: Determine functional domain of protein by comparison to known protein: Functional Domain Step 2: Determine 10 truncation sites on each side of functional domain; Make 20 primers. Functional Domain Step 3: Run 10x10=100 PCRs, clone products, screen for soluble expression, crystallizability

Superchunking Thioredoxin Reductase from P. falciparum ► 20 different soluble proteins from 90 cloned constructs. ► PCR success 100% -- used template of full-length PCR Erica Boni ►TR is a 60.7 kDa enzyme with a high degree of domain interaction

Erica Boni NATIVE Superchunking Thioredoxin Reductase

Erica Boni 18 off N-terminus & 8 off C-terminus 16 off C-terminus 7 off N-terminus Superchunking Thioredoxin Reductase

Conclusions: Relatively small changes in protein sequence can have dramatic effects on the behavior of proteins in expression and crystallization. Multiple species and chunking are two promising methods for obtaining protein variants.

Acknowledgements: University of Washington –Jamie Andreyka, Erica Boni, Tiffany Feist, Lutfiyah Haji, Colleen Liu, Natascha Mueller –Fred Buckner, Mike Gelb, Wes VanVoohris, Kevin Bauer –David Baker, David Kim, Erkang Fan, Stan Fields Group –Wim Hol and Hol group Seattle Biomedical Research Institute –Liz Worthey, Ellen Sisk, Peter Myler Hauptman Woodward Medical Research Institute –George Detitta, Joe Luft, Nancy Fehrman, Angela Luricella et al. Seattle Crystallization and Structure Determination Units –Oleksandr Kalyuzhniy, Lori Anderson –Ethan Merritt, Isolde Le Trong, Mark Robien Collaborators: –SSRL Stanford –ALS Berkeley NIH/NIGMS/NIAID