Download presentation
Presentation is loading. Please wait.
Published byMoses Sutton Modified over 9 years ago
1
PRABINA KUMAR MEHER SCIENTIST DIVISION OF STATISTICAL GENETICS INDIAN AGRICULTURAL STATISTICS RESERARCH INSTITUTE INDIAN COUNCIL OF AGRICULTURAL RESEARCH NEW DELHI-110012 A HYBRID APPROACH FOR IDENTIFYING 5’ SPLICING JUNCTION WITH HIGHER ACCURACY
2
Transcription DNA Pre mRNA mRNA Protein Splicing Translation 6 th World Congress on Biotechnology THE CENTRAL DOGMA Every GT in the gene is a possible donor site and it need to predicted as either true or false splice site
3
RATIONALE AND GENESIS Probabilistic WMM WAM MM1 MEM SAE Machine Learning MM1-SVM WD-SVM LIK-SVM MM1-SVM DS-SVM 6 th World Congress on Biotechnology
4
Zhang et al. (Experts systems with Applications, 2006) RATIONALE AND GENESIS… Windownsize-100bp 6 th World Congress on Biotechnology
5
Encoded test set Training data set of TSS and FSS Scoring matrix of FSS Scoring matrix of TSS Difference Training sites Test sites Encoded training set Difference matrix POS..(-44, -43)(-43,-42)…(42,43)(43,44) (AA) ………… (AT) ………… (AG) … (AC) … (……) … (CG) … (CC) ………… Huang et al. (Biochemie, 2006) RATIONALE AND GENESIS… Windownsize-88bp 6 th World Congress on Biotechnology
6
Less accurate with sub-optimal window length 4 Most of the approaches are species specific 3 Threshold is easy in MLA 2 Difficult to determine threshold in probabilistic approaches 1 RATIONALE AND GENESIS… 6 th World Congress on Biotechnology
7
DATA for Validation Human Bovine Fish Worm TSS 2796 90923 10000 1000 19000 HS3D UCSC Genome Browser Kamath et al. 2014 6 th World Congress on Biotechnology
8
DATA for Comparison NN269 TrainingTesting TSS #1116 FSS #4140 TSS #208 FSS #782 Each sequence is of 15nt long with conserved GT at 8 th and 9 th positions respectively 6 th World Congress on Biotechnology
9
Sequence Encoding WMM Shapiro and Senapathy where M is the sum of highest frequency at position 1 to L and N is the sum of lowest frequency at position 1 to L obtained from frequency matrix of nucleotides POSITIONAL FEATURE 6 th World Congress on Biotechnology
10
CONTD… DEPENDENCY FEATURE SAE WAM 6 th World Congress on Biotechnology
11
CONTD… COMPOSITIONAL FEATURE Dimers Triplets Tetramers 6 th World Congress on Biotechnology
12
Feature Selection 344 4 4 4 4 16+64+256 Total PositionalDependencyCompositional 4 4 4 4 14+15+12 6 th World Congress on Biotechnology
13
Feature Selection… Feature Type#FeaturesFeatures Positional4 Dependency4 Composition al 41 6 th World Congress on Biotechnology
14
Cross validation 1 2 3 4 5 TSS 1 2 3 4 5 FSS 1 2 3 4 1 2 3 4 55 Training Test Classifiers Prediction 6 th World Congress on Biotechnology
15
Parameter Optimization 6 th World Congress on Biotechnology
16
Performance measure 6 th World Congress on Biotechnology
17
Performance measure… Measure BalancedImbalanced HumanBovineFishWormHumanBovineFishWorm AUC-ROC 96.0596.9496.9596.2497.2197.4597.4198.06 AUC-PR 97.6497.8997.9197.9093.2493.3493.3892.29 6 th World Congress on Biotechnology
18
Comparative Analysis NN269#TSS#FSS Training1116208 Testing4140782 ApproachesAUC-ROCAUC-PR References MM1-SVM97.62 89.58 Baten et al., 2006 LIK-SVM98.0492.65 Sonnennburg et al., 2007 WD-SVM98.5092.86 WDS-SVM98.1392.47 EFFECT98.2092.81 Kamath et al., 2014 Proposed96.5393.54 6 th World Congress on Biotechnology
19
Prediction Server http://cabgrid.res.in:8080/hsplice HSplice 6 th World Congress on Biotechnology
20
ACKNOWLEDGEMENT DIRECTOR INDIAN AGRICULTURAL STATISTICS RESEARCH INSTITUTE NEW DELHI 6 th World Congress on Biotechnology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.