Bio/Chem-informatics

Slides:



Advertisements
Similar presentations
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Advertisements

Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Pfam(Protein families )
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Protein structure (Part 2 of 2).
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
The Protein Data Bank (PDB)
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Protein Modules An Introduction to Bioinformatics.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Structure Prediction II
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
EBI web resources II: Ensembl and InterPro Yanbin Yin Fall
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
PROTEIN DATABASES. The ideal sequence database for computational analyses and data-mining: I t must be complete with minimal redundancy It must contain.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Homology modeling with SWISS-MODEL
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Protein families, domains and motifs in functional prediction May 31, 2016.
bacteria and eukaryotes
Protein families, domains and motifs in functional prediction
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Protein Families, Motifs & Domains.
Functional manual annotation including GO
Demo: Protein Information Resource
Sequence based searches:
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Predicting Active Site Residue Annotations in the Pfam Database
There are four levels of structure in proteins
Evolution of Biochemical Pathways
Sequence Based Analysis Tutorial
Protein Sequence Analysis - Overview -
Prediction of protein function from sequence analysis
Protein Sequence Analysis - Overview -
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Bio/Chem-informatics © José R. Valverde, 2014 CC-BY-NC-SA

From sequence to atoms Cheminformatics

Index Goals Obtaining protein structures Obtaining protein sequences Comparing structures and sequences Obtaining ligand structures Limits

Goal Learn as much as you can about your protein Identify relevant properties Function Active site(s) Modifications Conserved features Relevant amino acids Cheminformatics: the application of informatic methods to solve chemical problems

Read Bibliography http://www.ncbi.nlm.nih.gov/pubmed Should be the initial step in all cases Should have been already done Likely to be neglected It is funnier to play from the start Guides all subsequent analysis and experiment Allows taking a decision It IS worth the trouble!

Sequence analysis Compare sequences and look for similarities and differences Match to experimental observation

Predict, predict, predict... Secondary Structure Properties (ProSite, PFAM, InterPro...)

ProSiteDoc {PS00433; PHOSPHOFRUCTOKINASE} {BEGIN} ********************************* * Phosphofructokinase signature * Phosphofructokinase (EC 2.7.1.11) (PFK) [1,2] is a key regulatory enzyme in the glycolytic pathway. It catalyzes the phosphorylation by ATP of fructose 6-phosphate to fructose 1,6-bisphosphate. In bacteria PFK is a tetramer of identical 36 Kd subunits. In mammals it is a tetramer of 80 Kd subunits. Each 80 Kd subunit consist of two homologous domains which are highly related to the bacterial 36 Kd subunits. In Human there are three, tissue-specific, types of PFK isozymes: PFKM (muscle), PFKL (liver), and PFKP (platelet). In yeast PFK is an octamer composed of four 100 Kd alpha chains (gene PFK1) and four 100 Kd beta chains (gene PFK2); like the mammalian 80 Kd subunits, the yeast 100 Kd subunits are composed of two homologous domains. As a signature pattern for PFK we selected a region that contains three basic residues involved in fructose-6-phosphate binding. -Consensus pattern: [RK]-x(4)-G-H-x-Q-[QR]-G-G-x(5)-D-R [The R/K, the H and the Q/R are involved in fructose-6-P binding] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Note: Escherichia coli has two phosphofructokinase isozymes which are encoded by genes pfkA (major) and pfkB (minor). The pfkB isozyme is not evolutionary related to other prokaryotic or eukaryotic PFK's (see <PDOC00504>).

InterPro Database of protein families, domains and functional sies Integrates other databases: PROSITE, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, PANTHER, GENE3D... http://www.ebi.ac.uk/interpro/

InterProScan

PredictProtein Automatic prediction of structural and functional properties of proteins Runs a test battery And gives a detailed report

Look for known structure

Search for homologs Search in structural databases PDB/RCSB Search in Sequence databases Blast against SwissProt Blast against EMBL/GenBank/DDBJ

Blast vs. PDB (EBI) Search for sequence-related structures

NCBI BlastPDB Search for structures of sequence-related structures

ModBase Search for possible 3-D models of the protein

Nature's SBKB Search for models from a number of servers

Alignment of mt ATP6 Spot a few, well-preserved, amino acids with a major role.

Multiple Alignment Problems Homologue proteins Risk: Too high conservation Same family Risk: Too little conservation

Analyze coevolution Co-evolving amino acids highlight interactions See review at CNB

Structural matching Protein Function Prediction Server Uses structural data from known files to make predictions Catalytic Site Atlas Uses structural models of active sites

Compare, compare, compare... The answer may already be there If not, similarities and differences allow you to scan genomes for useful targets, and proteins for target sites. There are many tools. There are “supertools” combining many tools e.g. STING Millenium Information is often cheaper than calculation

Limits Still reduced knowledge of 3-D structures Prediction accuracy needs to be asserted Check the database metadata Available models may be outdated or incorrect Too high or too low conservation preclude specific assignment New, unknown proteins and functions are possible

But, wait! There is more... much more! Image by geralt. CC0. http://pixabay.com/en/ball-http-www-crash-administrator-63527/