Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.

Slides:



Advertisements
Similar presentations
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Advertisements

BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Finding approximate palindromes in genomic sequences.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Transcription factor binding motifs (part I) 10/17/07.
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Tutorial 5 Motif discovery.
Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Multiple sequence alignments and motif discovery Tutorial 5.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Similar Sequence Similar Function Charles Yan Spring 2006.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Multiple testing correction
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Copyright OpenHelix. No use or reproduction without express written consent1.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Motif discovery and Protein Databases Tutorial 5.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Local Multiple Sequence Alignment Sequence Motifs
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
2016/1/27Summer Course1 Pattern Search Problems Part I: Fundament Concept.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Step 3: Tools Database Searching
Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
Learning Sequence Motif Models Using Expectation Maximization (EM)
Genome Center of Wisconsin, UW-Madison
BLAST.
Sequence Based Analysis Tutorial
BLAST.
Basic Local Alignment Search Tool
Protein structure prediction.
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Projects….
BLAST Slides adapted & edited from a set by
Presentation transcript:

Motif discovery Tutorial 5

Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM in motif DBs Agenda Cool story of the day: How NOT to be a bioinformatician

Motif – definition Motif a widespread pattern with a biological significance. Sequence motif PTB (RNA binding protein) UCUU CAP (DNA binding protein) TGTGAXXXXXXTCACAXT

Sequence motif – definition A000003/61/62/600 D03/62/6001/65/61/60 E004/ /6 G01/60011/30000 H01/ N Y /6 00..YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. Motif a nucleotide or amino-acid sequence pattern that is widespread and has a biological significance PSSM - position-specific scoring matrix

Can we find motifs using multiple sequence alignment (MSA)? YES! NO Local multiple sequence alignment is a hard problem to solve

Motif search: from de-novo motifs to motif annotation gapped motifs Large DNA data

MEME

MEME – Multiple EM* for Motif finding Motif discovery from unaligned sequences - genomic or protein sequences Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence) *Expectation-maximization

MEME - Input Input file (fasta file) How many times in each sequence? How many motifs? How many sites? Range of motif lengths

MEME - Output Motif e- value

MEME – Sequence logo Motif length Number of appearnces Motif e- value A graphical representation of the sequence motif

MEME – Sequence logo High information content = High confidence The relative sizes of the letters indicates their frequency in the sequences The total height of the letters depicts the information content of the position, in bits of information.

Multilevel Consensus MEME – Sequence logo

Patterns can be presented as regular expressions [AG]-x-V-x(2)-{YW} [] - Either residue x - Any residue x(2) - Any residue in the next 2 positions {} - Any residue except these Examples: AYVACM, GGVGAA

Sequence names Position in sequence Strength of match Motif within sequence MEME – motif alignment

Overall strength of motif matches Motif location in the input sequence MEME – motif locations Sequence names

What can we do with motifs? MAST - Search for them in non annotated sequence databases (protein and DNA). TOMTOM - Find the protein which binds the DNA motifs.

MAST

Searches for motifs (one or more) in sequence databases: – Like BLAST but motifs for input – Similar to iterations of PSI-BLAST Profile defines strength of match – Multiple motif matches per sequence MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

MAST - Input Input file (motifs) Database

If you wish to use motifs discovered by MEME

MAST - Output Input motifs Presence of the motifs in a given database

MAST – Output (another example, global view)

MAST – Output (another example, global view)

TOMTOM

Searches one or more query DNA motifs against one or more databases of target motifs, and reports for each query a list of target motifs, ranked by p-value. The output contains results for each query, in the order that the queries appear in the input file.

TOMTOM - Input Input motif Background frequencies Database

TOMTOM - Output Input motif Matching motifs

TOMTOM – Output Wrong input (RNA sequence of RNA binding protein NOVA1) “OK” results

MAST vs. TOMTOM MASTTOMTOM ComparisonProfile against DBProfile against Profile DBGeneral DBsKnown motif DBs

Cool Story of the day How NOT to be a bioinformatician