Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
1 Lesson 5 Protein Prediction and Classification.
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2.
UCSC Genome Browser Tutorial
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Matching Problems in Bioinformatics Charles Yan Fall 2008.
1 Lessons 5-6 Classifying a protein / Inside the genome.
Tutorial 5 Motif discovery.
1 Multiple sequence alignment Lesson What is a multiple sequence alignment?
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Protein Modules An Introduction to Bioinformatics.
Genome Browsing with the UCSC Genome Browser
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Single Motif Charles Yan Spring Single Motif.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
On line (DNA and amino acid) Sequence Information
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 1.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Genome Annotation BBSI July 14, 2005 Rita Shiang.
The UCSC Genome Browser Introduction
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genomics and Personalized Care in Health Systems Lecture 5 Genome Browser Leming Zhou, PhD School of Health and Rehabilitation Sciences Department of Health.
Sequencing a genome and Basic Sequence Alignment
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Sackler Medical School
Protein and RNA Families
Motif discovery and Protein Databases Tutorial 5.
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Finding genes in the genome
InterPro Sandra Orchard.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
University of Pittsburgh
Genome Center of Wisconsin, UW-Madison
There are four levels of structure in proteins
Ensembl Genome Repository.
Problems from last section
Presentation transcript:

Prosite and UCSC Genome Browser Exercise 3

Protein motifs and Prosite

Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The challenge is to turn this raw data into biological knowledge  A valuable tool for this challenge is an automated diagnostic pipe through which newly determined sequences can be streamlined

From sequence to function  Nature tends to innovate rather than invent  Proteins are composed of functional elements: domains and motifs Domains are structural units that carry out a certain function Domains are structural units that carry out a certain function The same domains are The same domains are shared between different proteins Motifs are shorter Motifs are shorter sequences with certain biological activity

What is a motif?  A sequence motif = a certain sequence that is widespread and conjectured to have biological significance  Examples: KDEL – ER-lumen retention signal PKKKRKV – an NLS (nuclear localization signal)

More loosely defined motifs  KDEL (usually) +  HDEL (rarely) =  [HK]-D-E-L: H or K at the first position  This is called a pattern (in Biology), or a regular expression (in computer science)

Syntax of a pattern  Example: W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Patterns  W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE] Any amino-acid, between 9-11 times F or Y or V WOPLASDFGYVWPPPLAWS ROPLASDFGYVWPPPLAWS WOPLASDFGYVWPPPLSQQQ 

Patterns - syntax  The standard IUPAC one-letter codes.  ‘x’ : any amino acid.  ‘[]’ : residues allowed at the position.  ‘{}’ : residues forbidden at the position.  ‘()’ : repetition of a pattern element are indicated in parenthesis. X(n) or X(n,m) to indicate the number or range of repetition.  ‘-’ : separates each pattern element.  ‘‹’ : indicated a N-terminal restriction of the pattern.  ‘›’ : indicated a C-terminal restriction of the pattern.  ‘.’ : the period ends the pattern.

Profile-pattern-consensus GTTCAA GCTGAA CTTCAC A.1000T C G GTTCAA [AC]-A-[GC]-T-[TC]-[GC] multiple alignment consensus pattern profileNNTNAN

Prosite  A method for determining the function of uncharacterized translated protein sequences  Database of annotated protein families and functional sites as well as associated patterns and profiles to identify them

Prosite  Entries are represented with patterns or profiles pattern A.1000T C G profile [AC]-A-[GC]-T-[TC]-[GC] Profiles are used in Prosite when the motif is relatively divergent and it is difficult to represent as a pattern

Scanning Prosite Query: sequence Query: pattern Result: all patterns found in sequence Result: all sequences which adhere to this pattern

prosite sequence query

Prosite profile

Prosite profile  sequence logo

Sequence logo

WebLogo

Searching Prosite with a sequence

Patterns with a high probability of occurrence  Entries describing commonly found post- translational modifications or compositionally biased regions.  Found in the majority of known protein sequences  High probability of occurrence

Searching Prosite with a pattern

prosite pattern query

Searching Prosite with a Prosite AC

UCSC Genome Browser

Reset all settings of previous user UCSC Genome Browser - Gateway

UCSC Genome Browser query results

UCSC Genome Browser Annotation tracks Vertebrate conservation mRNA (GenBank) RefSeq UCSC Genes Base position Single species compared SNPs Repeats Direction of transcription (<) CDS Intron UTR

USCS Gene

UCSC Genome Browser - movement Zoom x3 + Center

UCSC Genome Browser – Base view

Annotation track options dense squish full pack

Annotation track options Another option to toggle between ‘pack’ and ‘dense’ view is to click on the track title Sickle-cell anemia distr. Malaria distr.

BLAT  BLAT = Blast-Like Alignment Tool  BLAT is designed to find similarity of >95% on DNA, >80% for protein  Rapid search by indexing entire genome. Good for: 1. Finding genomic coordinates of cDNA 2. Determining exons/introns 3. Finding human (or chimp, dog, cow…) homologs of another vertebrate sequence 4. Find upstream regulatory regions

BLAT on UCSC Genome Browser

BLAT Results

Match Non-Match (mismatch/indel) Indel boundaries

BLAT Results

BLAT Results on the browser

Getting DNA sequence of region