Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions Jiajian Liu and Gary D. Stormo Presented by Aliya.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Systematic Evolution of Ligands by Exponential Enrichment: RNA Ligands to Bacteriophage T4 DNA Polymerase CRAIG TUERK AND LARRY GOLD.
A Genomic Code for Nucleosome Positioning Authors: Segal E., Fondufe-Mittendorfe Y., Chen L., Thastrom A., Field Y., Moore I. K., Wang J.-P. Z., Widom.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
Biology Mathematics Engineering Optics Physics Robotics Informatics.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Chromatin Immuno-precipitation (CHIP)-chip Analysis
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Promoter Panel Review. Background related Promoter In genetics, a promoter is a DNA sequence that enables a gene to be transcribed. It may be very long.
How many transcripts does it take to reconstruct the splice graph? Introduction Alternative splicing is the process by which a single gene may be used.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Introduction to BioInformatics GCB/CIS535
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Finding Transcription Modules from large gene-expression data sets Ned Wingreen – Molecular Biology Morten Kloster, Chao Tang – NEC Laboratories America.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529.
Ab initio motif finding
Modeling Regulatory Motifs 3/26/2013. Transcriptional Regulation  Transcription is controlled by the interaction of tran-acting elements called transcription.
Promoter Analysis TFBS Detection Daniel Rico, PhD. Daniel Rico, PhD.
Bioinformatics Ayesha M. Khan Spring Phylogenetic software PHYLIP l 2.
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Gary Stormo by Andrew Bardee. History Born 1950 in South Dakota Undergraduate in Biology from Caltech PhD in Molecular Biology from University of Colorado.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
WEBLOGO PLUS Sagar Gaikwad and Mohit Agrawal. LTMT.-RGDIGNYLGLTVETISRLLGRFQKLGVL LTMT.-RGDIGNYLGLTVETISR LTMT.-RGDIGNYLGLTVETISRLLGRFQKLGVI.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Sequence analysis – an overview A.Krishnamachari
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
Discovering Macromolecular Interactions. An experimental strategy for identifying new molecular actors in a process candidate approach general screen.
Proteome and interactome Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark 1/31 Prediction of significant positions in biological sequences.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Transcription control elements (DNA sequences) are binding sites for transcription factors, proteins that regulate transcription from an associated.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Algorithms in Bioinformatics: A Practical Introduction
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Codon Bias and its Relationship to Gene Expression Presented through a virtual grant by the Virtual Student Union.
Local Multiple Sequence Alignment Sequence Motifs
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Motif Search and RNA Structure Prediction Lesson 9.
Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Other uses of DNA microarrays
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
Measurement Methods in Systems Biology
Figure 1. Example TFBSshape analysis of DNA shape preferences for an Hnf4a TF dataset from UniPROBE. (A) Heat map showing predicted MGW profiles for individual.
Genomic Run-On Evaluates Transcription Rates for All Yeast Genes and Identifies Gene Regulatory Mechanisms  José Garcı́a-Martı́nez, Agustı́n Aranda, José.
New Technologies Provide Quantum Changes in the Scale, Speed, and Success of SELEX Methods and Aptamer Characterization  Abdullah Ozer, John M Pagano,
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)
BIOBASE Training TRANSFAC® ExPlain™
Universal microbial diagnostics using random DNA probes
Presentation transcript:

Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions Jiajian Liu and Gary D. Stormo Presented by Aliya Sadeque

Protein-DNA interactions Methods for measuring: Methods for measuring: Yeast 1 hybrid Yeast 1 hybrid ChIP on chip, DNA microarray ChIP on chip, DNA microarray Important distinction in terms of specificity Important distinction in terms of specificity Enzymes vs. transcription factors Enzymes vs. transcription factors Bioinformatics Jan;16(1):16-23.

TFBS Transcription Factor Binding Sites Goal: Knowing the specificity of a TF in order to locate its binding sites within the genome Goal: Knowing the specificity of a TF in order to locate its binding sites within the genome Sites represented as consensus sequences or weight matrices Sites represented as consensus sequences or weight matrices Databases: TRANSFAC, JASPAR Databases: TRANSFAC, JASPAR

SELEX Systematic Evolution of Ligands By Exponential Enrichment

QuMFRA Nucleic Acids Res Jun 15;29(12): PMID:

Selling points Provides a general method that can be used for any DNA- binding protein even if nothing is known about its specificity. Provides a general method that can be used for any DNA- binding protein even if nothing is known about its specificity. Can isolate a small set of specific binding sites from a very large pool of random sequences Can isolate a small set of specific binding sites from a very large pool of random sequences Nucleic Acids Res Jun 15;29(12):

Zif268

SELEX Procedure

SELEX Binding Model Assuming an additive model Assuming an additive model Frequency: Frequency: Weight Weight

Sequence Logo obtained from SELEX

QuMFRA Procedure 15 sequences 15 sequences Cover the space of possible sequence Cover the space of possible sequence Competitive binding assay Competitive binding assay

Uh oh, math.  Intensities of each DNA in a separated band Intensities of each DNA in a separated band Obtained from emission matrix and output vector Obtained from emission matrix and output vector Relative binding constant of a test site with respect to a reference site Relative binding constant of a test site with respect to a reference site Reference was GGGT Reference was GGGT boundunbound

QuMFRA Binding Model Weight matrixConsensus Sequence Matrix values are proportional to binding affinity according to the Berg and von Hippel theory.

Comparing notes Between SELEX and QuMFRA Experimental K a for 15 sequences Experimental K a for 15 sequences Renormalized on consensus sequence Renormalized on consensus sequence

Comparing notes Between predictions and empirical binding affinities Found affinity measures for 8 variants of the consensus sequence with one or two changed positions Found affinity measures for 8 variants of the consensus sequence with one or two changed positions

A Probabilitistic Recognition Code I promise those are all real words

Alternatives SAGE-SELEX Improves on the number of binding sites found by SELEX alone Improves on the number of binding sites found by SELEX alone Large sample size reqired for statistical significance Large sample size reqired for statistical significance Biases in SELEX Biases in SELEX

Alternatives dsDNA chips Chips contain binding sites for a TF of interest Chips contain binding sites for a TF of interest High throughput quantitative data High throughput quantitative data Almost all possible binding sites would have to be on the chip…that’s a lot. Almost all possible binding sites would have to be on the chip…that’s a lot.

Alternatives Similar Sequences optimized selection of DNA variants to be tested experimentally optimized selection of DNA variants to be tested experimentally quantitative protein-DNA binding assay quantitative protein-DNA binding assay prediction of binding affinity for all variants using a statistical model prediction of binding affinity for all variants using a statistical model Can be done in high throughput with high accuracy, provided the consensus sequence is known Can be done in high throughput with high accuracy, provided the consensus sequence is known

Praises General: Can be used for a TF of unknown specificity and size Can be used for a TF of unknown specificity and size Efficient Efficient Parallel (different colours of fluorophore) Parallel (different colours of fluorophore) First step narrows down sample size First step narrows down sample size

Heckles Sequenced a sample of ~20 - too small. Sequenced a sample of ~20 - too small. Could skew data. Secondary preference? Could skew data. Secondary preference? Why no A at position 1 in QuMFRA? Why no A at position 1 in QuMFRA? Weakness/inherent bias in SELEX Weakness/inherent bias in SELEX Empirical data - is an average appropriate? Empirical data - is an average appropriate?

Heckles Correlation values (0.54, 0.6, 0.77, 0.94) – are these significant? No T-test or anything? Correlation values (0.54, 0.6, 0.77, 0.94) – are these significant? No T-test or anything?

Questions?