Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Structural bioinformatics
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Bioinformatics and Phylogenetic Analysis
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Tutorial 5 Motif discovery.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Protein Modules An Introduction to Bioinformatics.
Multiple sequence alignments and motif discovery Tutorial 5.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Protein and Function Databases
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
Multiple sequence alignment
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Protein and RNA Families
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Biological Networks.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Motif Search and RNA Structure Prediction Lesson 9.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Step 3: Tools Database Searching
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Gene Expression Ilana Granovsky Jonathan Laserson.
Bioinformatics Overview
Bio/Chem-informatics
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Demo: Protein Information Resource
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Sequence Based Analysis Tutorial
Protein Sequence Analysis - Overview -
Sequence Based Analysis Tutorial
Protein Sequence Analysis - Overview -
Basic Local Alignment Search Tool
Presentation transcript:

Intro to Bioinformatics Summary

What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global best use MSA tools such as Clustal X, Muscle

What did we learn Multiple alignments (MSA) When? How ? MSA are needed as an input for many different purposes: searching motifs, phylogenetic analysis, protein and RNA structure predictions, conservation of specific nts/residues Tools : Clustal X (for DNA and RNA), MUSCLE (for proteins) Tools for phylogenetic trees: PHYLIP …

What did we learn Search a sequence against a database When? How ? - BLAST :Remember different option for BLAST!!! (blastP blastN…. ), make sure to search the right database!!! DO NOT FORGET –You can change the scoring matrices, gap penalty etc - PSIBLAST Searching for remote homologies - PHIBLAST Searching for a short pattern within a protein

What did we learn Motif search When? How ? - Searching for known motifs in a given promoter (JASPAR) -Searching for overabundance of unknown regulatory motifs in a set of sequences ; e.g promoters of genes which have similar expression pattern (MEME) Tools : MEME, logo, Databases of motifs : JASPAR (Transcription Factors binding sites) PRATT in PROSITE (searching for motifs in protein sequences)

What did we learn Protein Function Prediction When? How ? - Pfam (database to search for protein motifs/domain (PfamA/PfamB) - PROSITE - Protein annotations in UNIPROT (SwissProt/ Tremble)

What did we learn Protein Secondary Structure Prediction- When? How ? – Helix/Beta/Coil(PHDsec,PSIPRED). – Predicts transmembrane helices (PHDhtm,TMHMM). – Solvent accessibility: important for the prediction of ligand binding sites (PHDacc).

What did we learn Protein Tertiary Structure Prediction- When? How ? – First we must look at sequence identity to a sequence with a known structure!! – Homology modeling/Threading – MODEBase- database of models Remember : Low quality models can be miss leading !! Tools : SWISS-MODEL,genTHREADER, MODEBase

What did we learn RNA Structure and Function Prediction- When? How ? – RNAfold – good for local interactions, several predictions of low energy structures – Alifold – adding information from MSA – RFAM – Specific database and search tools: tRNA, microRNA …..

What did we learn Gene expression When? How ? – Many database of gene expression GEO … – Clustering analysis EPClust (different clustering methods K-means, Hierarchical Clustering, trasformations row/columns/both…) – GO annotation (analysis of gene clusters..)

So How do we start … Given a hypothetical sequence predict it function…. What should we do???

Example Amyloids are proteins which tend to aggregate in solution. Abnormal accumulation of amyloid in organs is assumed to play a role in various neurodegenerative diseases. Question : can we predict whether a protein X is an amyolid ?

Research Plan 1 Building a Database -Search the protein database (swiss prot) for proteins annotated as amyloids -Select a set of proteins which are amyloids related to human diseases 2. Analyzing the unique properties of the family -Use tools learned in class to calculate the protein properties (per each protein) which can be related to amyloidosis (5-10 different sequence/predicted structural features) For example Fact : Amyloids tend to aggregare via beta sheet – Calculate: the percent of secondary structure (H,E, C) Fact : Amyloids tend to aggregate via aromatic residues Calculate : the percent of different amino acids in the proteins …. 3.Summerize the results a and use basic statistical tools to evaluate if there are features Which are characteristic of this group, suggest a model to predict a new protein related to this group of proteins.