Bioinformatics and Computational Molecular Biology Geoff Barton

Slides:



Advertisements
Similar presentations
Introduction and Importance of Bioinformatics: Application in Drug/Vaccine Design G. P. S. Raghava Web:
Advertisements

JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Structural bioinformatics
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
The Cell, Central Dogma and Human Genome Project.
Introduction to BioInformatics GCB/CIS535
CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
© Wiley Publishing All Rights Reserved. Biological Sequences.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
A number of slides taken/modified from:
Protein Tertiary Structure Prediction
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Protein Sequence Alignment and Database Searching.
CS 790 – Bioinformatics Introduction and overview.
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Finish up array applications Move on to proteomics Protein microarrays.
Introduction to Proteomics 1. What is Proteomics? Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Overview of Bioinformatics 1 Module Denis Manley..
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Genes and Genomic Datasets. DNA compositional biases Base composition of genomes: E. coli: 25% A, 25% C, 25% G, 25% T P. falciparum (Malaria parasite):
Central dogma: the story of life RNA DNA Protein.
Bioinformatics and Computational Biology
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Molecular Genetics Introduction to
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Introduction to molecular biology Data Mining Techniques.
Bioinformatics Overview
Protein Structure Prediction and Protein Homology modeling
Genomes and Their Evolution
Sequencing Data Analysis
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
Predicting Active Site Residue Annotations in the Pfam Database
Genomes and Their Evolution
Bioinformatics Biological Data Computer Calculations +
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Geneomics and Database Mining and Genetic Mapping
Nancy Baker SILS Bioinformatics Seminar January 21, 2004
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Protein Structures.
From Mendel to Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequencing Data Analysis
Presentation transcript:

Bioinformatics and Computational Molecular Biology Geoff Barton

Practical Tutorial Dr David Martin practical tutorial on the use of pymol molecular graphics software. In this lecture I will show lots of protein structures – use to find them, and/or scop domains database (find with google).

Similarities in Proteins Lecture 1 –Overview of data in molecular biology –Protein modelling –Similarities of Protein Sequence, Structure, Function

Introduction to Sequence Comparison Lecture 2: –Why compare sequences? –Methods for sequence comparison/alignment. –Multiple alignment –Database searching - FASTA/BLAST –Iterative searching - PSI-BLAST

Practical/WWW references Organised by Drs Martin –Good preparation would be to look at: and –Look at BLAST and FASTA on these sites as well as database access facilities.

Private Data Past Experiments. Lab note books. Group discussions. Traditional biological research Analysis Reading. Talking. Thinking. Hypothesis! Experiment Design. Execution. Publish! Public Data Journals Conferences

Private Data Past Experiments. Lab note books. Group discussions. DNA sequences Protein Sequences Genetic maps Transcripts 3D structures proteomics results SNP data etc Bioinformatics/Computational Biology and biological research Analysis Reading. Talking. Thinking. Computational Analysis Software Development Hypothesis! Computer aided. Experiment Design. Execution. Computational experiments Simulation Publish! Database submission Database management Public Data Journals Conferences DNA sequences Protein Sequences Genetic maps Transcripts 3D structures proteomics results SNP data etc

EMBL Nucleotide Sequence Database Growth (to 2nd Oct 2006) Taken from:

Protein Sequences Approx 3,500,000 known for all species (Oct ) 25,000 for Human (not counting splice variants and post-translational modifications)

Protein 3D Structures Approx 39,000 known (much duplication)

Biological data in context

DNA RNA Protein Sequence Protein 3D structure Molecular function Overview of Biological Hierarchy... Whole organism animal, plant, etc. Tissue/organ brain, heart, lungs blood,... Ecosystem many different organisms Population group of the same type of organism Family group with known common lineage Cell nerve,muscle,etc.. Organelle nucleus, mitochondria, etc... Nucleus Chromosome Gene Molecular Levels

DNA RNA Protein Sequence Protein 3D structure Molecular function Whole organism animal, plant, etc. Tissue/organ brain, heart, lungs blood,... Ecosystem many different organisms Population group of the same type of organism Family group with known common lineage Cell nerve,muscle,etc.. Organelle nucleus, mitochondria, etc... Nucleus Chromosome Gene Expression Data (Transcriptomics) Which of the genes are switched on in which cells/tissues and when? What are the effects of drugs and disease on expression patterns DNA ‘CHIP’ TECHNOLOGY Technology and data in biology

DNA RNA Protein Sequence Protein 3D structure Molecular function Whole organism animal, plant, etc. Tissue/organ brain, heart, lungs blood,... Ecosystem many different organisms Population group of the same type of organism Family group with known common lineage Cell nerve,muscle,etc.. Organelle nucleus, mitochondria, etc... Nucleus Chromosome Gene Protein Expression Data (Proteomics) Which proteins are being produced in which cells/tissues when? Which modified forms are present? What are the effects of drugs and disease on these patterns 2D Gels + Mass Spectrometry. Technology and data in biology

DNA RNA Protein Sequence Protein 3D structure Molecular function Whole organism animal, plant, etc. Tissue/organ brain, heart, lungs blood,... Ecosystem many different organisms Population group of the same type of organism Family group with known common lineage Cell nerve,muscle,etc.. Organelle nucleus, mitochondria, etc... Nucleus Chromosome Gene Protein 3D Structure - the bridge to chemistry (Structural Genomics) What is the atomic level structure of the protein? What other molecules does it interact with? What small molecules - potential drugs - does it interact with? What are the effects of point mutations on the structure? X-ray crystallography, NMR spectroscopy, single particle, cryo-electron microscopy. Technology and data in biology

Whole organism animal, plant, etc. Tissue/organ brain, heart, lungs blood,... Ecosystem many different organisms Population group of the same type of organism Family group with known common lineage Cell nerve,muscle,etc.. Organelle nucleus, mitochondria, etc... Nucleus Chromosome Gene DNA RNA Protein Sequence Protein 3D structure Molecular function Overview of Biological Hierarchy... Macroscopic Levels

Biology is now a data intensive science To do good science, you need to know how to use (and not abuse) computational tools.

Protein Structure Prediction ‘Homology’ modelling –Relies on the fact that similarity of sequence implies similarity of 3D structure.

Lysozyme (1lz1)  -lactalbumin (1alc) ? Imagine we don’t know the 3D structure of  -lactalbumin, but we do know its amino acid sequence and that of lysozyme

Lysozyme (1lz1)  -lactalbumin (1alc) 37.7% Identity, Z=17.6 ?

Protein structure prediction (Homology Modelling) Align sequence of protein of unknown structure to sequence of protein of known structure. In ‘conserved core’ of protein, substitute the amino acid types into the known structure. Deal with ‘loops’ between the core elements of structure.

Lysozyme (1lz1)  -lactalbumin (1alc) 37.7% Identity, Z=17.6

Protein structure prediction (Homology modelling) Problems: –Need protein of known structure that is similar in sequence. –Building loops where there are deletions. –Verifying model. Key is getting a good alignment in the first place –Bad alignment => bad model.

Good alignment on its own can: Identify key residues (absolutely conserved) Identify likely protein core (conserved hydrophobic residues) Help predict protein secondary structure (not this lecture).

Sequence alignment is a fundamental technique in molecular biology. May predict proteins of common function even when no 3D structure is known. May be used to predict 3D structure and so help understanding of mutants. Some examples of where this is right and wrong...

Prediction of structure and function by similarity to known sequences and structures Assumption is that similar sequence implies similar structure and function. But what do we mean by “similar”? Does similarity of sequence really imply similarity of function?

Protein Sequence/Structure/Function Network Sequence3D StructureFunction Similar Different

Protein Sequence/Structure/Function Network Sequence3D StructureFunction Similar Different

Similar Sequence, Similar Structure, Similar Function. e.g. Trypsin-like Serine Proteinases Same fold, same catalytic mechanism. But DIFFERENT specificity. e.g. Immunoglobulin variable domains. Same fold, similar binding function. But DIFFERENT specificity. True of all examples. Similarities only give clues to function, differences in specificity can be regarded as differences of function.

Immunoglobulin Variable Domains e.g. see: 1a2y

Tryptophan at core of Ig variable domain

Protein Sequence/Structure/Function Network Sequence3D StructureFunction Similar Different

Lysozyme (1lz1)  -lactalbumin (1alc) 37.7% Identity, Z=17.6

 -crystallin/ L-Lactate Dehydrogenase

Protein Sequence/Structure/Function Network Sequence3D StructureFunction Similar Different

Trypsin (3ptn)Subtilisin (2sec)

Trypsin (3ptn) Subtilisin (2sec)

Trypsin (3ptn) Subtilisin (2sec) His- 57, Asp-102, Ser-195 Asp- 32, His- 64, Ser-221

Protein Sequence/Structure/Function Network Sequence3D StructureFunction Similar Different

Nature 398,84-90, 1999 PDB: 1b47

11% sequence ID rmsd 1.47 Å over 70 residues PDB: 1b47

Protein Sequence/Structure/Function Network Sequence3D StructureFunction Similar Different

Russell, R. B. and Barton, G. J. (1993), "An SH2-SH3 Domain hybrid", Nature, 364, 765. PDB: 1bia PDB: 2ptk

PDB:2aai PDB:1bas

Matthews, S., et al. (1994), "The p17 Matrix Protein from HIV-1 is Structurally Similar to Interferon-gamma", Nature, 370,

Protein Sequence/Structure/Function Network Sequence3D StructureFunction Similar Different Does this ever happen?

HIV Reverse Transcriptase (RT)

HIV Reverse Transcriptase (RT) - domain linkers

Protein Sequence and Structural Similarity

Barton, G. J. et al, (1992), "Human Platelet Derived Endothelial Cell Growth Factor is Homologous to E.coli Thymidine Phosphorylase", Prot. Sci., 1,

Protein Sequence and Structural Similarity

Barton, G. J., Cohen, P. T. C. and Barford, D. (1994), "Conservation Analysis and Structure Prediction of the Protein Serine/Threonine Phosphatases: Sequence Similarity with Diadenosine Tetra-phosphatase fromE. coli Suggests Homology to the Protein Phosphatases", Eur. J. Biochem.,220,

Protein Sequence and Structural Similarity

Russell, R. B. and Barton, G. J. (1993), "An SH2-SH3 Domain hybrid", Nature, 364, 765.

Reading material for this lecture: This lecture itself. pdf’s for “Barton” papers: Database statistics: Structure of the amino-terminal domain of Cbl complexed to its binding site on ZAP-70 kinase Wuyi Meng, Sansana Sawasdikosol, Steven J. Burakoff, Michael J. Eck Nature 398, (04 March 1999) (available on-line at - search for ZAP-70 kinase - republished in December on-line) Protein recognition: An SH2 domain in disguise John Kuriyan, James E. Darnell Nature 398, (04 March 1999) (news and views article for above paper) Russell, R. B. and Barton, G. J. (1993), "An SH2-SH3 Domain hybrid", Nature, 364, 765. Matthews, S., et al. (1994), "The p17 Matrix Protein from HIV-1 is Structurally Similar to Interferon-gamma", Nature, 370, Barton, G. J., Cohen, P. T. C. and Barford, D. (1994), "Conservation Analysis and Structure Prediction of the Protein Serine/Threonine Phosphatases: Sequence Similarity with Diadenosine Tetra-phosphatase fromE. coli Suggests Homology to the Protein Phosphatases", Eur. J. Biochem.,220,

The end of Lecture 1 Lecture 2 will be on sequence comparison methods.