Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.

Slides:



Advertisements
Similar presentations
Intro to Comp Genomics Lecture 9: Motif finding. Sequence specific transcription factors Sequence specific transcription factors (TFs) are a critical.
Advertisements

Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Ab initio gene prediction Genome 559, Winter 2011.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
Motif Finding in Transcription Factor Binding Sites Jian-Bien Chen ( 陳建儐 )
Transcription factor binding motifs (part I) 10/17/07.
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Tutorial 5 Motif discovery.
Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)
CisGreedy Motif Finder for Cistematic Sarah Aerni Mentors: Ali Mortazavi Barbara Wold.
Multiple sequence alignments and motif discovery Tutorial 5.
MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers Pengyu Hong 10/06/2005.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Ab initio motif finding
Bioinformatics Sequence Analysis III
REGULATORY GENOMICS Saurabh Sinha, Dept. of Computer Science & Institute of Genomic Biology, University of Illinois.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Biological Motif Discovery Concepts Motif Modeling and Motif Information EM and Gibbs Sampling Comparative Motif Prediction Applications Transcription.
Transcription Regulation Transcription Factor Motif Finding Xiaole Shirley Liu STAT115, STAT215.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Computational Genomics III: Gibbs motif sampler & advanced motif.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Marcin Pacholczyk, Silesian University of Technology.
Motif discovery EM algorithm Gibbs Sampler Enumeration Regression methods Phylogenetic trees Purpose Construction Finding significance Not directly related.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Expectation Maximization and Gibbs Sampling – Algorithms for Computational Biology Lecture 1- Introduction Lecture 2- Hashing and BLAST Lecture 3-
WEBLOGO PLUS Sagar Gaikwad and Mohit Agrawal. LTMT.-RGDIGNYLGLTVETISRLLGRFQKLGVL LTMT.-RGDIGNYLGLTVETISR LTMT.-RGDIGNYLGLTVETISRLLGRFQKLGVI.
CSCE555 Bioinformatics Lecture 10 Motif Discovery Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
1 Finding Regulatory Motifs. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics by.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
Motif discovery and Protein Databases Tutorial 5.
Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions Jiajian Liu and Gary D. Stormo Presented by Aliya.
Cis-regulatory Modules and Module Discovery
Pattern Discovery and Recognition for Genetic Regulation Tim Bailey UQ Maths and IMB.
Biological Motif Discovery Concepts Motif Modeling and Motif Information EM and Gibbs Sampling Comparative Motif Prediction Applications Transcription.
Local Multiple Sequence Alignment Sequence Motifs
CS 6243 Machine Learning Advanced topic: pattern recognition (DNA motif finding)
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
Motif identification with Gibbs Sampler Xuhua Xia
CS5263 Bioinformatics Lecture 11 Motif finding. HW2 2(C) Click to find out K and lambda.
1 Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Authors Mayetri Gupta & Jun S. Liu Presented by Ellen Bishop 12/09/2003.
REGULATORY GENOMICS Saurabh Sinha, Dept. of Computer Science & Institute of Genomic Biology, University of Illinois.
Regulatory Motif Finding
A Very Basic Gibbs Sampler for Motif Detection
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Learning Sequence Motif Models Using Expectation Maximization (EM)
Transcription factor binding motifs
Sequential Pattern Discovery under a Markov Assumption
Transcription factor binding motifs
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Presentation transcript:

Special Topics in Genomics Motif Analysis

Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC ATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAG TF TGGGTGGTC TGGGTGGTA TGGGAGGTC TGGGTGGTG TGAGTGGTC TGGGTGGTC Transcription Factor Binding Sites (TFBS) DNA motif: Protein motif:

Motif representation

Consensus sequence Example: CACSTG

Sequence Logo Schneider & Stephens, Nucleic Acids Res. 18: (1990) Entropy (Shannon) – a measurement of uncertainty The amount of uncertainty reduced by observing sequences is the amount of information (or information content) we obtained: This is the height of each position in the logo plot. Height of each nucleotide is proportional to its frequency

Two questions in motif analysis Known motif mapping Finding occurrences of a motif in nucleotide or amino acid sequences De novo motif discovery Finding motifs that are previously unknown

Known motif mapping Consensus mapping STEP 1: provide a motif (e.g. CACSTG = CAC[C,G]TG) STEP 2: specify number of mismatches allowed (e.g. <=1) STEP 3: scan the sequence CGCCGGGACCAGATCAACGCCGAGATCCGGCACATGAAGGAGCT m=3, no m=1, yes A useful tool: CisGenome (

Known motif mapping Motif matrix mapping (CisGenome) STEP 1: provide a motif and background model STEP 2: specify a likelihood ratio cutoff (e.g. LR>=500) STEP 3: scan the sequence 00  GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA LR>500, yes LR<500, no Motif: Background: A C G T A C G T A C G T Another tool for matrix mapping MAST (

De novo motif discovery Two major class of methods: 1. Word enumeration 2. Matrix updating

Word enumeration Example: Sinha & Tompa, Nucleic Acids Res. 30: (2002) STEP 1: enumerate possible words; STEP 2: count word occurrences; STEP 3: compare observed word count with random expectation.

Matrix updating CONSENSUS (Stormo & Hartzell, PNAS, 86: , 1990) STEP 1: use all k-mers in the first sequence as seeds; STEP 2: find matches (often use best matches) of each seed in the second sequence; STEP 3: update seed matrices, exclude matrices with low information content; STEP 4: repeat step 2 and 3 for all sequences.

Matrix updating Mixture model 00 , W EM: Lawrence and Reilly (1990) Bailey and Elkan (1994), etc. Gibbs Sampler: Lawrence et al. (1993) Liu (1994), Liu et al. (1995), etc. S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA A: Motif:Background: q = [q 0,q 1 ]q0q0 q1q1 A C G T A C G T A C G T ,W,q A Inference by iterative estimation/sampling

Other issues Dependencies within motif Functions of novel motifs