Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey

Slides:



Advertisements
Similar presentations
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Advertisements

Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Promoter Panel Review. Background related Promoter In genetics, a promoter is a DNA sequence that enables a gene to be transcribed. It may be very long.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using.
Fuzzy K means.
A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Marcin Pacholczyk, Silesian University of Technology.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Gary Stormo by Andrew Bardee. History Born 1950 in South Dakota Undergraduate in Biology from Caltech PhD in Molecular Biology from University of Colorado.
Motif search Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
From Genomes to Genes Rui Alves.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
Algorithms in Bioinformatics: A Practical Introduction
Motif Detection in Yeast Vishakh Joe Bertolami Nick Urrea Jeff Weiss.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Motif Search and RNA Structure Prediction Lesson 9.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
David Amar, Tom Hait, and Ron Shamir
Summary -Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological.
bacteria and eukaryotes
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
CS273B: Deep learning for Genomics and Biomedicine
Outline of the chromatin immunoprecipitation (ChIP) technique
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Inferring Models of cis-Regulatory Modules using Information Theory
De novo Motif Finding using ChIP-Seq
Introduction Feature Extraction Discussions Conclusions Results
Control of Gene Expression
Large Scale Data Integration
Recitation 7 2/4/09 PSSMs+Gene finding
Techniques for Analyzing DNA
Advanced PGDB Editing: Regulation GO Terms
Evaluating Classifiers (& other algorithms)
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Targeted re-sequencing of linkage region on 2q21 identifies a novel functional variant for hip and knee osteoarthritis  M. Taipale, E. Jakkula, O.-P.
The ETS-CRE array identifies genomic sequences specifically cobound by CREB1 and GABPα. The ETS-CRE array identifies genomic sequences specifically cobound.
Interpretation of Similar Gene Expression Reordering
Lucas-Kanade Registration Algorithm
Bo Li, Akshay Tambe, Sharon Aviran, Lior Pachter  Cell Systems 
Roc curves By Vittoria Cozza, matr
BIOBASE Training TRANSFAC® ExPlain™
Deep Learning in Bioinformatics
Universal microbial diagnostics using random DNA probes
Interaction of MAPJD with E-box sequence of the RIOK1 gene.
Presentation transcript:

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey Zhengyang Wang 04/24/2017

Definitions DNA- and RNA- binding proteins Sequence specificities proteins that regulate many cellular processes, including transcription, translation, etc. Sequence specificities motifs (patterns) in DNA or RNA sequences

Problem Settings Input: DNA or RNA probe sequences with binding scores (probe intensities) as labels Goal: Predict labels for new sequences and location of motifs

Old Approach: Position Weight Matrix sequences  position frequency matrix (PFM)  position probability matrix (PPM)

Old Approach: Position Weight Matrix PPM  position weight matrix (PWM)

Old Approach: Position Weight Matrix The score of a sequence can be calculated by adding the relevant values at each position in the PWM. The sequence score can also be interpreted in a physical framework as the binding energy for that sequence. Scan for hits over a genomic sequence to detect potential binding sites. Problem: PWM is not accurate since it ignores the dependencies among positions.

New Approach: DeepBind Use deep learning methods to capture sequence specificities and let the algorithms find PWM-like detectors all by itself. Advantages: Can handle data in different forms Can handle large data set Can handle data set acquired using different ways

DeepBind: Model Overview

DeepBind: Model Details

Experiments and Results: PBM data PBM: Protein Binding Microarrays Microarray: a grid of DNA segments of known sequence that is used to test and map DNA fragments, antibodies, or proteins.

Experiments and Results: PBM data Methods were evaluated using the Pearson correlation between the predicted and actual probe intensities, and values from the area under the receiver operating characteristic (ROC) curve (AUC) computed by setting high- intensity probes as positives and the remaining probes as negatives.

Question Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning Zhengyang Wang zhengyang.wang2@wsu.edu