Protein Analysis Tools 2 nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University.

Slides:



Advertisements
Similar presentations
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Regulation of Gene Expression 13 February, 2013
Introduction to CLC Main Workbench 20 June, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System.
Sequence Similarity Searching 24 th September, 2012 Ansuman Chattopadhyay, PhD, Head, Molecular Biology Information Service Health Sciences Library System.
Secondary structure prediction from amino acid sequence.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
On line (DNA and amino acid) Sequence Information Lecture 7.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Flying to the Top, One Tweet at a Time: Using Social Media to Rank Online Search Results Robyn B. Reed, MA, MLIS Co-authors: Carrie L. Iwema, PhD, MLS.
Reaching the Masses: multimedia biomedical the point of need Carrie Iwema, PhD, MLS Information Specialist in Molecular Biology Ansuman Chattopadhyay,
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Protein and Function Databases
© Wiley Publishing All Rights Reserved. Biological Sequences.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Pathway Informatics 6 th July, 2015 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of the ENCODE Project (ENCyclopedia Of DNA Elements) Data Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Levels of Protein Structure
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Day 2: Protein Sequence Analysis 1.Physico-chemical properties. 2.Cellular localization. 3.Signal peptides. 4.Transmembrane domains. 5.Post-translational.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics and Computational Biology
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Sequence Alignment.
Protein Sequence Alignment Multiple Sequence Alignment
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
Structure and Function
Pathway Informatics 30 th March, 2016 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University.
Protein families, domains and motifs in functional prediction May 31, 2016.
Selection of Resources for the Development of an Information Service Program in Molecular Biology and Genetics Ansuman Chattopadhyay, PhD Information Specialist.
Regulation of Gene Expression
Bio/Chem-informatics
ChIP-Seq Analysis – Using CLCGenomics Workbench
Covering the Bases: Carrie Iwema, PhD, MLS
Visualization of genomic data
There are four levels of structure in proteins
Sequence Based Analysis Tutorial
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
Sequence Based Analysis Tutorial
Different Genes ~ Protein Primary Structure
Vector NTI Introduction
ExPASy (Expert Protein Analysis System)
Transcriptomics Data Visualization Using Partek Flow Software
Presentation transcript:

Protein Analysis Tools 2 nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh

What we’ll do: Brief overview of CLC Main Workbench find genomic context of a protein sequence search for the presence of conserved domains create a multiple sequence alignment plot

What we’ll do: analyze primary structure such as, hydrophobicity, hydrophylicity, antigenicity, repeat sequence detection etc. predict secondary structure predict post translational modification such as,  Phosphorylation, glycosylation, …. search for interacting partners predict domain driven protein-protein interactions

Workshop Resources

HSLS MolBio Videos

Sequence Analysis Software Suits Wisconsin GCG VectorNTI DNA STAR-LaserGene Geneious CLC Main

Why CLC Main ? Windows Mac Linux DNA, RNA, Protein, Microarray Data Analysis Regular Update HSLS Licensed

CLC Main Access HSLS CLC Main Registration  Link: Access via Pitt - Network Connect  Instruction video:

CLC Main Workbench Overview Graphical Users Interface Protein sequences Import Sequence Navigation

CLC Main Graphical User Interface (GUI)

CLC Main

Navigate a protein sequence

CLC Main –getting started (basic navigation steps): ovideos/clc-navigation-ac0312.swfhttp://media.hsls.pitt.edu/media/molbi ovideos/clc-navigation-ac0312.swf CLC Main Workbench Walkthrough (Part1): clcmain-walkthrough-part1-ac0112.swf clcmain-walkthrough-part1-ac0112.swf CLC Main Workbench Walkthrough (Part2): clcmain-walkthrough-part2-ac0112.swf clcmain-walkthrough-part2-ac0112.swf Videos

Import a Protein Sequence

Protein Sequence Human PLCg1  Refseq no: NP_  Uniprot Accession Number: P19174  FASTA file  Raw sequence CLC features: Search, Import, Create new sequence

Import a DNA /Protein sequence into CLC Main (Part1): ovideos/clc-import-part1-ac0112.swfhttp://media.hsls.pitt.edu/media/molbi ovideos/clc-import-part1-ac0112.swf Import a DNA /Protein sequence into CLC Main (Part 2): os/clc-import-part2-ac0112.swfhttp://media.hsls.pitt.edu/media/molbiovide os/clc-import-part2-ac0112.swf Videos

CLC protein sequence

Protein sequence manipulation Create a new protein with PLCg1 SH2-SH2- SH3 domains

Sequence Alignment Pair-wise Alignment  Global  Local Multiple Sequence Alignment

Sequence Alignment

Pair-wise Sequence Alignment

Multiple Sequence Alignment

Tools: ClustalW and T-coffee

PLCg1 Orthologous sequences PLCg1:  Mouse: NP_  Rat: NP_  Cow: NP_  Dog: XP_  Zebra fish: NP_  Human: NP_  NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651

Create a multiple sequence alignment plot using CLC(part1): part1.swf Create a multiple sequence alignment plot using CLC (part2): part2.swf Create a multiple sequence alignment plot: Compare two peptide sequences.: Videos

Starting with a short peptide sequence find:  the whole protein sequence  orthologs in other species (nematode) Tool: UCSC BLAT NCBI BLAST against SwissProt

Peptide to whole protein Peptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR

Place a mRNA or peptide sequence into the human genome (BLAT): Find homologous sequences: Videos

Find homologous sequence SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR

Sequence Manipulation & Format Conversion Sequence Manipulation Suite  Readseq  GenePept FASTA

Hands-On Retrieve amino acid sequence present between position 25 to 45 in Sequence A (MS Word Doc)  Identify the rat gene which encodes this peptide fragment and retrieve its whole protein sequence  Find the fruit fly homolog of this protein. What % identity the fruit fly protein shares with its rat homolog? Predict potential MAPK phosphorylation sites present in the fruit fly protein

Protein Domain Search: InterPro Scan InterPro is a database of protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences. >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK

Videos: Find protein domains, PTM, secondary str etc: t.swf t.swf Start with a protein pattern and find what proteins posses that domain: rosite.swf rosite.swf Search for protein domains,repeats and sites: o.swf o.swf

Protein Domain Search: ScanProsite >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK

Pattern Search [AC]-x-V-x(4)-{ED}:  This pattern is translated as: [Ala or Cys]-any-Val- any-any-any-any-{any but Glu or Asp}  F-[GSTV]-P-R-L-[G>]

Pattern Search

Protein Primary Structure Analysis Tool: ExPASy from SIB  Calculated Mol Wt  Theoritical PI  Extinction coefficients  Estimated half-life Hydropathicity plot : Kyte & DoolittleKyte & Doolittle Hydrophilicity plot: Hopp T.P., Woods K.R

Antigenic Site Prediction Tool: Emboss Antigenic >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK

EmBoss Antigenic Antigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar.Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use.

Transmembrane Region prediction

Transmembrane Site Prediction Tool: TMHMM Server >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK

Protein Secondary Structure >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK

Protein-Protein Interactions Prediction Tool: STRING >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK

Hands-on Take the human BCL2 protein sequence and  Find its domain architecture  Predict the topology of its transmembrane region  Design suitable antigenic site for antibody generation  What is its calculated Mol Wt and Ext Coefficient?  Predict its secondary structure What % of this protein possesses alpha helical structure?  Predict its potential interacting partners

Hands-on Prediction of potential phosphorylation sites present in a protein sequence. Sequence: human BCL2 >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIF SSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLR QAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWI QDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

Phosphorylation Site Prediction: >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Tool: NetPhos

Phosphorylation Site Prediction: >gi| |ref|NP_ | B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Tool: GPS

Thank you! Any questions? Carrie IwemaAnsuman Chattopadhyay