Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta.

Slides:



Advertisements
Similar presentations
Chapter 10 How proteins are made.
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Random Projection Approach to Motif Finding Adapted from RandomProjections.ppt.
Warm up Mon 11/3/14 Adv Bio 1. What does the phrase “gene regulation” mean? 2. If the lac operon cannot bind to the repressor.. What would be the outcome?
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Profiles for Sequences
Transcription and Translation
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Motif Finding. Regulation of Genes Gene Regulatory Element RNA polymerase (Protein) Transcription Factor (Protein) DNA.
Transcription factor binding motifs (part I) 10/17/07.
DNA Regulatory Binding Motif Search Dong Xu Computer Science Department 109 Engineering Building West
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
(Regulatory-) Motif Finding. Clustering of Genes Find binding sites responsible for common expression patterns.
Finding Regulatory Motifs in DNA Sequences
Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Doug Raiford Lesson 3.  Have a fully sequenced genome  How identify the genes?  What do we know so far? 10/13/20152Gene Prediction.
RNA and Protein Synthesis
Part Transcription 1 Transcription 2 Translation.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Protein Synthesis Occurs in 2 steps – Step 1: Transcription Taking DNA and transcribing it into RNA – Step 2: Translation Taking RNA and translating it.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Protein Synthesis. Transcription DNA  mRNA Occurs in the nucleus Translation mRNA  tRNA  AA Occurs at the ribosome.
The Lac Operon An operon is a length of DNA, made up of structural genes and control sites. The structural genes code for proteins, such as enzymes.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.
Gene Expression. Remember, every cell in your body contains the exact same DNA… …so why does a muscle cell have different structure and function than.
Chapter 13: RNA and Protein Synthesis RNA. What is RNA? RNA (Ribonucleic Acid) – How is RNA physically different from DNA? 1. Single strand not a double.
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley.
Local Multiple Sequence Alignment Sequence Motifs
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
(H)MMs in gene prediction and similarity searches.
Protein synthesis continued.  Transcription is step 1  DNA  mRNA  Nucleus  RNA polymerase.
Protein Synthesis. One Gene – One Enzyme Protein Synthesis.
Genome Annotation (protein coding genes)
Protein Synthesis.
Protein Synthesis Molecular Biology
A Very Basic Gibbs Sampler for Motif Detection
“How does it affect the protein?”
How do we get actual traits from our genes and DNA?
Gibbs sampling.
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Learning Sequence Motif Models Using Expectation Maximization (EM)
Recitation 7 2/4/09 PSSMs+Gene finding
Control of Gene Expression in Eukaryotic cells
Science Vocabulary Topic: DNA Unit #4 By:
(Regulatory-) Motif Finding
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
Prokaryotic (Bacterial) Gene Regulation
Protein Synthesis.
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Prokaryotes Eukaryotes  
Protein Synthesis.
Presentation transcript:

Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Protein Synthesis from DNA Translation to Proteins Transcription Regulation RNA polymerase Binding Proteins gene Binding sites

Binding Sites Sequence A: code for protein Binding protein “A” Binding Site A - A - C - G - A - C - T - T - G - C - T - G - T - T - C - A - A - C - C - A - A - A - G - T - T - G - G - T - Sequence B: code for protein A - A - G - G - A - C - T - T - C - C - T - G - C - G - T - T - G - C - T - C - G - C - A - A - C - G - A - G - Binding protein “A”

Discovering New Binding Motifs …ATCG GCTCAG CTAG… …CACT GATCAG AGTA… …TTCC GCTCTG TAAC… …GCTA GCTCAA ATCG… Motif Probability Model Motif: GCTCAG

Modeling Motifs in Sequences ATATCCGTA AATCGAGAC TCGATGTGT CCACCTGCA Assume: Break into N sequences Each sequence has one instance of motif embedded in random background Variations of motif by point mutation, but not insertion or deletion

Modeling Motifs in Sequences AT ATC CGTA A ATC GAGAC TCG ATG TGT CC ACC TGCA The “Alignment:” Starting position of motif in each sequence The “Motif Probability Distribution:” Probability of each letter occurring at each motif position

Scoring a Model “Log-likelihood” score: ATATCCGTA AATCGAGAC TCGATGTGT CCACCTGCA p 1,T p 2,A p 3,T p 1,A p 2,G p 3,A p 1,A p 2,T p 3,G p 1,C p 2,C p 3,A p C p C p G p T p A pApA 0 p A p A p T p C p G pCpC 0 p T p C p G p T p G p T p C p C p T p G p C p A 0 0 0

Example Models A TAT CCGTA AAT CGA GAC TCGATG TGT CC ACC TGCA {3, 2, 4, 3} AT ATC CGTA A ATC GAGAC TCG ATG TGT CC ACC TGCA {2, 4, 7, 3}

The Gibbs Sampler We want to find that maximizes

The Gibbs Sampler

Times visited Over time, the frequency distribution approaches

Biasing our search to these areas may discover the pj,ro values which maximize faster. If we assume areas of local maximization contribute the most during “integration” to the local maximizations of Optimization Technique

Multiple Gibbs Samplers By combining results from Gibbs Samplers begun at random positions, find maximizing sooner

Replica Exchange/Parallel Tempering “Low-sensitivity” samplers which “scout out area” periodically swap with “high-sensitivity” samplers good at focused searches if swap appears promising.

Controlling Sensitivity Adjust the relative probability of sampling an x i by adjusting a new parameter in distribution: Small Large Search breadth of space Focused search of region

Testing the Sensitivity Running on randomly generated sequences to see motifs found, different sensitivity samplers converge to different scores. Betas

Predicting Convergence Score Measure of Similarity: magnetization “Configuration Score:” energy Ex: m=.5 m=.5 E=0 m=1 E=-6J m=0 E=2J m=0 E=2J m=0 E=2J

Alignment Analogue m=.77 E=-5J m=1 E=-9J m=.77 E=-5J m=.77 E=-5J A: B: C:

Test Results L < |alphabet| w

Test Results L > |alphabet| w

Test Results

Hidden Motifs: Gibbs Sampler Beta =.1Beta =.5Beta =.9 Beta = 1.3Beta = 1.7Beta = 2 W=5, l=500

Hidden Motifs: Replica Exchange Betas