Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey Zhengyang Wang 04/24/2017
Definitions DNA- and RNA- binding proteins Sequence specificities proteins that regulate many cellular processes, including transcription, translation, etc. Sequence specificities motifs (patterns) in DNA or RNA sequences
Problem Settings Input: DNA or RNA probe sequences with binding scores (probe intensities) as labels Goal: Predict labels for new sequences and location of motifs
Old Approach: Position Weight Matrix sequences position frequency matrix (PFM) position probability matrix (PPM)
Old Approach: Position Weight Matrix PPM position weight matrix (PWM)
Old Approach: Position Weight Matrix The score of a sequence can be calculated by adding the relevant values at each position in the PWM. The sequence score can also be interpreted in a physical framework as the binding energy for that sequence. Scan for hits over a genomic sequence to detect potential binding sites. Problem: PWM is not accurate since it ignores the dependencies among positions.
New Approach: DeepBind Use deep learning methods to capture sequence specificities and let the algorithms find PWM-like detectors all by itself. Advantages: Can handle data in different forms Can handle large data set Can handle data set acquired using different ways
DeepBind: Model Overview
DeepBind: Model Details
Experiments and Results: PBM data PBM: Protein Binding Microarrays Microarray: a grid of DNA segments of known sequence that is used to test and map DNA fragments, antibodies, or proteins.
Experiments and Results: PBM data Methods were evaluated using the Pearson correlation between the predicted and actual probe intensities, and values from the area under the receiver operating characteristic (ROC) curve (AUC) computed by setting high- intensity probes as positives and the remaining probes as negatives.
Question Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning Zhengyang Wang zhengyang.wang2@wsu.edu