Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Bioinformatics and Phylogenetic Analysis
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Protein Interactions and Disease Audry Kang 7/15/2013.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Development of Bioinformatics and its application on Biotechnology
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Protein Bioinformatics Course
Christian M Zmasek, PhD 15 June 2010.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Day 2: Protein Sequence Analysis 1.Physico-chemical properties. 2.Cellular localization. 3.Signal peptides. 4.Transmembrane domains. 5.Post-translational.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
es/by-sa/2.0/. From Protein Sequence to Protein Properties Prof:Rui Alves Dept Ciencies.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein and RNA Families
Mark D. Adams Dept. of Genetics 9/10/04
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Protein families, domains and motifs in functional prediction May 31, 2016.
Selection of Resources for the Development of an Information Service Program in Molecular Biology and Genetics Ansuman Chattopadhyay, PhD Information Specialist.
bacteria and eukaryotes
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Protein families, domains and motifs in functional prediction
Sequence based searches:
High-throughput Biological Data The data deluge
Predicting Active Site Residue Annotations in the Pfam Database
Today… Review a few items from last class
Dr Tan Tin Wee Director Bioinformatics Centre
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship (Phylogeny) 3-D fold model Protein sorting and sub-cellular localization Anchoring into the membrane Signal sequence (tags) Protein modifications  Some nascent proteins contain a specific signal, or targeting sequence that directs them to the correct organelle. ( ER, mitochondrial, chloroplast, lysosome, vacuoles, Golgi, or cytosol )

 Can we train the computers:  To detect signal sequences and predict protein destination?  T o identify conserved domains (or a pattern) in proteins?  To predict the membrane-anchoring type of a protein? ( Transmembrane domain, GPI anchor… )  T o predict the 3D structure of a protein?  Learning algorithms are good for solving problems in pattern recognition because they can be trained on a sample data set.  Classes of learning algorithms: -Artificial neural networks (ANNs) -Hidden Markov Models (HMM) Questions

Artificial neural networks (ANN)  Machine learning algorithms that mimic the brain. Real brains, however, are orders of magnitude more complex than any ANN.  ANNs, like people, learn by example. ANNs cannot be programmed to perform a specific task.  ANN is composed of a large number of highly interconnected processing elements (neurons) working simultaneously to solve specific problems.  The first artificial neuron was developed in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits.

Hidden Markov Models (HMM)  HMM is a probabilistic process over a set of states, in which the states are “hidden”. It is only the outcome that visible to the observer. Hence, the name Hidden Markov Model.  HMM has many uses in genomics:  Gene prediction (GENSCAN)  SignalP  Finding periodic patterns  Used to answer questions like:  What is the probability of obtaining a particular outcome?  What is the best model from many combinations?

 Expasy server ( is dedicated to the analysis of protein sequences and structures. The ExPASy (Expert Protein Analysis System)  Sequence analysis tools include:  DNA -> Protein [ Translate ]  Pattern and profile searches  Post-translational modification and topology prediction  Primary structure analysis  Structure prediction (2D and 3D)  Alignment

 PredictProtein: A service for sequence analysis, and structure prediction  TMpred:  TMHMM: Predicts transmembrane helices in proteins (CBS; Denmark)  big-PI : Predicts GPI-anchor site :  DGPI: Predicts GPI-anchor site :  SignalP : Predicts signal peptide :  PSORT: Predicts sub-cellular localization:  TargetP: Predicts sub-cellular localization:  NetNGlyc: Predicts N-glycosylation sites :  PTS1: Predicts peroxisomal targeting sequences  MITOPROT: Predicts of mitochondrial targeting sequences  Hydrophobicity :

prediction server  NetNGlyc: Predicts N-glycosylation sites:  NetPhos: Predicts phosphorylation of residues:  NetPhosK: Predicts recognition sites for specific kinases:  NetAcet: N-terminal acetylation in eukaryotic proteins: NetAcet  NetCGlyc: C-mannosylation sites in mammalian proteins NetCGlyc

Multiple alignment  Used to do phylogenetic analysis:  Same protein from different species  Evolutionary relationship: history  Used to find conserved regions  Local multiple alignment reveals conserved regions  Conserved regions usually are key functional regions  These regions are prime targets for drug developments  Protein domains are often conserved across many species  Algorithm for search of conserved regions:  Block maker :

Multiple alignment tools  Free programs:  Phylip and PAUP :  Phyml :  The most used websites :    (T-COFFEE and ClustalW)  ClustalW:  Standard popular software  It aligns 2 and keep on adding a new sequence to the alignment  Problem: It is simply a heuristics.  Motif discovery: use your own motif to search databases :  PatternFind:

Phylogenetic analysis  Phylogenetic trees  Describe evolutionary relationships between sequences  Major modes that drive the evolution:  Point mutations modify existing sequences  Duplications (re-use existing sequence)  Rearrangement  Two most common methods  Maximum parsimony  Maximum likelihood  The most useful software:

Definitions  Homologous: Have a common ancestor. Homology cannot be measured.  Orthologous: The same gene in different species. It is the result of speciation (common ancestral)  Paralogous : Related genes (already diverged) in the same species. It is the result of genomic rearrangements or duplication

Determining protein Structure-Function  Direct measurement of structure  X-ray crystallography  NMR spectroscopy  Site-directed mutagenesis  Computer modeling  Prediction of structure  Comparative protein-structure modeling

Comparative protein-structure modeling  Goal: Construct 3-D model of a protein of unknown structure (target), based on similarity of sequence to proteins of known structure (templates) Blue : predicted model by PROSPECT Red : NMR structure  Procedure:  Template selection  Template–target alignment  Model building  Model evaluation

The Protein 3-D Database  The Protein DataBase (PDB) contains 3-D structural data for proteins  Founded in 1971 with a dozen structures  As of June 2004, there were 25,760 structures in the database. All structures are reviewed for accuracy and data uniformity.  Structural data from the PDB can be freely accessed at  80% come from X-ray crystallography  16% come from NMR  2% come from theoretical modeling

High-throughput methods

Most used websites for 3-D structure prediction  Protein Homology/analogY Recognition Engine (Phyre) at  PredictProtein at  UCLA Fold Recognition at

Commercial bioinformatics softwares CLC Genomics Workbench  Genomics: 454, Illumina Genome Analyzer and SOLiD sequencing data; De novo assembly of genomes of any size; Advanced visualization, scrolling, and zooming tools; SNP detection using advanced quality filtering;  Transcriptomics: RNA-seq including paired data and transcript-level expression; Small RNA analysis; Expression profiling by tags;  Epigenetics: Chromatin immunoprecipitation sequencing (ChIP-seq) analysis; Peak finding and peak refinement; Graph and table of background distribution; false discovery rate; Peak table and annotations;  VectorNTI: Sequence analysis and illustration; restriction mapping; recombinant molecule design and cloning; in silico gel electrophoresis; synthetic biology workflows  AlignX:  BioAnnotator:  ContigExpress :  GenomBench

The bioinformatics not covered in this class  Comparative genomics and Genome browser:  Genome annotation: rast.nmpdr.org/  Metagenomics:  System biology tools.