Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship (Phylogeny) 3-D fold model Protein sorting and sub-cellular localization Anchoring into the membrane Signal sequence (tags)  Some nascent proteins contain a specific signal, or targeting sequence that directs them to the correct organelle. ( ER, mitochondrial, chloroplast, lysosome, vacuoles, Golgi, or cytosol )

 Can we train the computers:  To detect signal sequences and predict protein destination?  T o identify conserved domains (or a pattern) in proteins?  To predict the membrane-anchoring type of a protein? ( Transmembrane domain, GPI anchor… )  T o predict the 3D structure of a protein?  Learning algorithms are good for solving problems in pattern recognition because they can be trained on a sample data set.  Classes of learning algorithms: -Artificial neural networks (ANNs) -Hidden Markov Models (HMM) Questions

Artificial neural networks (ANN)  Machine learning algorithms that mimic the brain. Real brains, however, are orders of magnitude more complex than any ANN so far considered.  ANNs, like people, learn by example. ANNs cannot be programmed to perform a specific task.  ANN is composed of a large number of highly interconnected processing elements (neurons) working simultaneously to solve specific problems.  The first artificial neuron was developed in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits.

Hidden Markov Models (HMM)  HMM is a probabilistic process over a set of states, in which the states are “hidden”. It is only the outcome that visible to the observer. Hence, the name Hidden Markov Model.  HMM has many uses in genomics:  Gene prediction (GENSCAN)  SignalP  Finding periodic patterns  Used to answer questions like:  What is the probability of obtaining a particular outcome?  What is the best model from many combinations?

 Expasy server (http://au.expasy.org) is dedicated to the analysis of protein sequences and structures. The ExPASy (Expert Protein Analysis System)  Sequence analysis tools include:  DNA -> Protein [ Translate ]  Pattern and profile searches  Post-translational modification and topology prediction  Primary structure analysis  Structure prediction (2D and 3D)  Alignment

 PredictProtein: A service for sequence analysis, and structure prediction http://www.predictprotein.org/newwebsite/submit.html  TMpred: http://www.ch.embnet.org/software/TMPRED_form.html http://www.ch.embnet.org/software/TMPRED_form.html  TMHMM: Predicts transmembrane helices in proteins (CBS; Denmark) http://www.cbs.dtu.dk/services/TMHMM-2.0/  big-PI : Predicts GPI-anchor site : http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html  DGPI: Predicts GPI-anchor site : http://129.194.185.165/dgpi/index_en.html http://129.194.185.165/dgpi/index_en.html  SignalP : Predicts signal peptide : http://www.cbs.dtu.dk/services/SignalP/ http://www.cbs.dtu.dk/services/SignalP/  PSORT: Predicts sub-cellular localization: http://www.psort.org/ http://www.psort.org/  TargetP: Predicts sub-cellular localization: http://www.cbs.dtu.dk/services/TargetP/ http://www.cbs.dtu.dk/services/TargetP/  NetNGlyc: Predicts N-glycosylation sites : http://www.cbs.dtu.dk/services/NetNGlyc/ http://www.cbs.dtu.dk/services/NetNGlyc/  PTS1: Predicts peroxisomal targeting sequences http://mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsp  MITOPROT: Predicts of mitochondrial targeting sequences http://ihg.gsf.de/ihg/mitoprot.html  Hydrophobicity : http://www.vivo.colostate.edu/molkit/hydropathy/index.htmlhttp://www.vivo.colostate.edu/molkit/hydropathy/index.html

Multiple alignment  Used to do phylogenetic analysis:  Same protein from different species  Evolutionary relationship: history  Used to find conserved regions  Local multiple alignment reveals conserved regions  Conserved regions usually are key functional regions  These regions are prime targets for drug developments  Protein domains are often conserved across many species  Algorithm for search of conserved regions:  Block maker : http://blocks.fhcrc.org/blocks/make_blocks.html http://blocks.fhcrc.org/blocks/make_blocks.html

Multiple alignment tools  Free programs:  Phylip and PAUP : http://evolution.genetics.washington.edu/phylip.html http://evolution.genetics.washington.edu/phylip.html  Phyml : http://atgc.lirmm.fr/phyml/ http://atgc.lirmm.fr/phyml/  The most used websites :  http://align.genome.jp/ http://align.genome.jp/  http://prodes.toulouse.inra.fr/multalin/multalin.html http://prodes.toulouse.inra.fr/multalin/multalin.html  http://www.ch.embnet.org/index.html (T-COFFEE and ClustalW) http://www.ch.embnet.org/index.html  ClustalW:  Standard popular software  It aligns 2 and keep on adding a new sequence to the alignment  Problem: It is simply a heuristics.  Motif discovery: use your own motif to search databases :  PatternFind: http://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_search

Phylogenetic analysis  Phylogenetic trees  Describe evolutionary relationships between sequences  Major modes that drive the evolution:  Point mutations modify existing sequences  Duplications (re-use existing sequence)  Rearrangement  Two most common methods  Maximum parsimony  Maximum likelihood

Parsimony vs Maximum likelihood  Parsimony is the most popular method in which the simplest answer is always the preferred one.  It involves statistical evaluation of the number of mutations need to explain the observed data.  The best tree is the one that requires the fewest number of evolutionary changes.  Likelihood generally performs better than parsimony  I n contrast, maximum likelihood does not necessarily satisfy any optimality criterion. It attempts to answer the question:  What parameters of evolutionary events was likely to produce the current data set?  This is computationally difficult to do. This is the slowest of all methods.

Definitions  Homologous: Have a common ancestor. Homology cannot be measured.  Orthologous: The same gene in different species. It is the result of speciation (common ancestral)  Paralogous : Related genes (already diverged) in the same species. It is the result of genomic rearrangements or duplication

Determining protein structure  Direct measurement of structure  X-ray crystallography  NMR spectroscopy  Site-directed mutagenesis  Computer modeling  Prediction of structure  Comparative protein-structure modeling

Comparative protein-structure modeling  Goal: Construct 3-D model of a protein of unknown structure (target), based on similarity of sequence to proteins of known structure (templates) Blue : predicted model by PROSPECT Red : NMR structure  Procedure:  Template selection  Template–target alignment  Model building  Model evaluation

The Protein 3-D Database  The Protein DataBase (PDB) contains 3-D structural data for proteins  Founded in 1971 with a dozen structures  As of June 2004, there were 25,760 structures in the database. All structures are reviewed for accuracy and data uniformity.  Structural data from the PDB can be freely accessed at http://www.rcsb.org/pdb/  80% come from X-ray crystallography  16% come from NMR  2% come from theoretical modeling

High-throughput methods

Most used websites for 3-D structure prediction  Protein Homology/analogY Recognition Engine (Phyre) at http://www.sbg.bio.ic.ac.uk/phyre/html/index.html  PredictProtein at http://www.predictprotein.org/newwebsite/submit.html  UCLA Fold Recognition at http://www.doe-mbi.ucla.edu/Services/FOLD/

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.

Similar presentations

Presentation on theme: "Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.

Similar presentations

Presentation on theme: "Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship."— Presentation transcript:

Similar presentations

About project

Feedback