Download presentation
Presentation is loading. Please wait.
1
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship (Phylogeny) 3-D fold model Protein sorting and sub-cellular localization Anchoring into the membrane Signal sequence (tags) Some nascent proteins contain a specific signal, or targeting sequence that directs them to the correct organelle. ( ER, mitochondrial, chloroplast, lysosome, vacuoles, Golgi, or cytosol )
2
Can we train the computers: To detect signal sequences and predict protein destination? T o identify conserved domains (or a pattern) in proteins? To predict the membrane-anchoring type of a protein? ( Transmembrane domain, GPI anchor… ) T o predict the 3D structure of a protein? Learning algorithms are good for solving problems in pattern recognition because they can be trained on a sample data set. Classes of learning algorithms: -Artificial neural networks (ANNs) -Hidden Markov Models (HMM) Questions
3
Artificial neural networks (ANN) Machine learning algorithms that mimic the brain. Real brains, however, are orders of magnitude more complex than any ANN so far considered. ANNs, like people, learn by example. ANNs cannot be programmed to perform a specific task. ANN is composed of a large number of highly interconnected processing elements (neurons) working simultaneously to solve specific problems. The first artificial neuron was developed in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits.
4
Hidden Markov Models (HMM) HMM is a probabilistic process over a set of states, in which the states are “hidden”. It is only the outcome that visible to the observer. Hence, the name Hidden Markov Model. HMM has many uses in genomics: Gene prediction (GENSCAN) SignalP Finding periodic patterns Used to answer questions like: What is the probability of obtaining a particular outcome? What is the best model from many combinations?
5
Expasy server (http://au.expasy.org) is dedicated to the analysis of protein sequences and structures. The ExPASy (Expert Protein Analysis System) Sequence analysis tools include: DNA -> Protein [ Translate ] Pattern and profile searches Post-translational modification and topology prediction Primary structure analysis Structure prediction (2D and 3D) Alignment
6
PredictProtein: A service for sequence analysis, and structure prediction http://www.predictprotein.org/newwebsite/submit.html TMpred: http://www.ch.embnet.org/software/TMPRED_form.html http://www.ch.embnet.org/software/TMPRED_form.html TMHMM: Predicts transmembrane helices in proteins (CBS; Denmark) http://www.cbs.dtu.dk/services/TMHMM-2.0/ big-PI : Predicts GPI-anchor site : http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html DGPI: Predicts GPI-anchor site : http://129.194.185.165/dgpi/index_en.html http://129.194.185.165/dgpi/index_en.html SignalP : Predicts signal peptide : http://www.cbs.dtu.dk/services/SignalP/ http://www.cbs.dtu.dk/services/SignalP/ PSORT: Predicts sub-cellular localization: http://www.psort.org/ http://www.psort.org/ TargetP: Predicts sub-cellular localization: http://www.cbs.dtu.dk/services/TargetP/ http://www.cbs.dtu.dk/services/TargetP/ NetNGlyc: Predicts N-glycosylation sites : http://www.cbs.dtu.dk/services/NetNGlyc/ http://www.cbs.dtu.dk/services/NetNGlyc/ PTS1: Predicts peroxisomal targeting sequences http://mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsp MITOPROT: Predicts of mitochondrial targeting sequences http://ihg.gsf.de/ihg/mitoprot.html Hydrophobicity : http://www.vivo.colostate.edu/molkit/hydropathy/index.htmlhttp://www.vivo.colostate.edu/molkit/hydropathy/index.html
7
Multiple alignment Used to do phylogenetic analysis: Same protein from different species Evolutionary relationship: history Used to find conserved regions Local multiple alignment reveals conserved regions Conserved regions usually are key functional regions These regions are prime targets for drug developments Protein domains are often conserved across many species Algorithm for search of conserved regions: Block maker : http://blocks.fhcrc.org/blocks/make_blocks.html http://blocks.fhcrc.org/blocks/make_blocks.html
8
Multiple alignment tools Free programs: Phylip and PAUP : http://evolution.genetics.washington.edu/phylip.html http://evolution.genetics.washington.edu/phylip.html Phyml : http://atgc.lirmm.fr/phyml/ http://atgc.lirmm.fr/phyml/ The most used websites : http://align.genome.jp/ http://align.genome.jp/ http://prodes.toulouse.inra.fr/multalin/multalin.html http://prodes.toulouse.inra.fr/multalin/multalin.html http://www.ch.embnet.org/index.html (T-COFFEE and ClustalW) http://www.ch.embnet.org/index.html ClustalW: Standard popular software It aligns 2 and keep on adding a new sequence to the alignment Problem: It is simply a heuristics. Motif discovery: use your own motif to search databases : PatternFind: http://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_search
9
Phylogenetic analysis Phylogenetic trees Describe evolutionary relationships between sequences Major modes that drive the evolution: Point mutations modify existing sequences Duplications (re-use existing sequence) Rearrangement Two most common methods Maximum parsimony Maximum likelihood
10
Parsimony vs Maximum likelihood Parsimony is the most popular method in which the simplest answer is always the preferred one. It involves statistical evaluation of the number of mutations need to explain the observed data. The best tree is the one that requires the fewest number of evolutionary changes. Likelihood generally performs better than parsimony I n contrast, maximum likelihood does not necessarily satisfy any optimality criterion. It attempts to answer the question: What parameters of evolutionary events was likely to produce the current data set? This is computationally difficult to do. This is the slowest of all methods.
11
Definitions Homologous: Have a common ancestor. Homology cannot be measured. Orthologous: The same gene in different species. It is the result of speciation (common ancestral) Paralogous : Related genes (already diverged) in the same species. It is the result of genomic rearrangements or duplication
12
Determining protein structure Direct measurement of structure X-ray crystallography NMR spectroscopy Site-directed mutagenesis Computer modeling Prediction of structure Comparative protein-structure modeling
13
Comparative protein-structure modeling Goal: Construct 3-D model of a protein of unknown structure (target), based on similarity of sequence to proteins of known structure (templates) Blue : predicted model by PROSPECT Red : NMR structure Procedure: Template selection Template–target alignment Model building Model evaluation
14
The Protein 3-D Database The Protein DataBase (PDB) contains 3-D structural data for proteins Founded in 1971 with a dozen structures As of June 2004, there were 25,760 structures in the database. All structures are reviewed for accuracy and data uniformity. Structural data from the PDB can be freely accessed at http://www.rcsb.org/pdb/ 80% come from X-ray crystallography 16% come from NMR 2% come from theoretical modeling
15
High-throughput methods
16
Most used websites for 3-D structure prediction Protein Homology/analogY Recognition Engine (Phyre) at http://www.sbg.bio.ic.ac.uk/phyre/html/index.html PredictProtein at http://www.predictprotein.org/newwebsite/submit.html UCLA Fold Recognition at http://www.doe-mbi.ucla.edu/Services/FOLD/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.