Presentation is loading. Please wait.

Presentation is loading. Please wait.

PRABINA KUMAR MEHER SCIENTIST DIVISION OF STATISTICAL GENETICS INDIAN AGRICULTURAL STATISTICS RESERARCH INSTITUTE INDIAN COUNCIL OF AGRICULTURAL RESEARCH.

Similar presentations


Presentation on theme: "PRABINA KUMAR MEHER SCIENTIST DIVISION OF STATISTICAL GENETICS INDIAN AGRICULTURAL STATISTICS RESERARCH INSTITUTE INDIAN COUNCIL OF AGRICULTURAL RESEARCH."— Presentation transcript:

1 PRABINA KUMAR MEHER SCIENTIST DIVISION OF STATISTICAL GENETICS INDIAN AGRICULTURAL STATISTICS RESERARCH INSTITUTE INDIAN COUNCIL OF AGRICULTURAL RESEARCH NEW DELHI-110012 A HYBRID APPROACH FOR IDENTIFYING 5’ SPLICING JUNCTION WITH HIGHER ACCURACY

2 Transcription DNA Pre mRNA mRNA Protein Splicing Translation 6 th World Congress on Biotechnology THE CENTRAL DOGMA Every GT in the gene is a possible donor site and it need to predicted as either true or false splice site

3 RATIONALE AND GENESIS Probabilistic WMM WAM MM1 MEM SAE Machine Learning MM1-SVM WD-SVM LIK-SVM MM1-SVM DS-SVM 6 th World Congress on Biotechnology

4 Zhang et al. (Experts systems with Applications, 2006) RATIONALE AND GENESIS… Windownsize-100bp 6 th World Congress on Biotechnology

5 Encoded test set Training data set of TSS and FSS Scoring matrix of FSS Scoring matrix of TSS Difference Training sites Test sites Encoded training set Difference matrix POS..(-44, -43)(-43,-42)…(42,43)(43,44) (AA) ………… (AT) ………… (AG) … (AC) … (……) … (CG) … (CC) ………… Huang et al. (Biochemie, 2006) RATIONALE AND GENESIS… Windownsize-88bp 6 th World Congress on Biotechnology

6 Less accurate with sub-optimal window length 4 Most of the approaches are species specific 3 Threshold is easy in MLA 2 Difficult to determine threshold in probabilistic approaches 1 RATIONALE AND GENESIS… 6 th World Congress on Biotechnology

7 DATA for Validation Human Bovine Fish Worm TSS 2796 90923 10000 1000 19000 HS3D UCSC Genome Browser Kamath et al. 2014 6 th World Congress on Biotechnology

8 DATA for Comparison NN269 TrainingTesting TSS #1116 FSS #4140 TSS #208 FSS #782 Each sequence is of 15nt long with conserved GT at 8 th and 9 th positions respectively 6 th World Congress on Biotechnology

9 Sequence Encoding WMM Shapiro and Senapathy where M is the sum of highest frequency at position 1 to L and N is the sum of lowest frequency at position 1 to L obtained from frequency matrix of nucleotides POSITIONAL FEATURE 6 th World Congress on Biotechnology

10 CONTD… DEPENDENCY FEATURE SAE WAM 6 th World Congress on Biotechnology

11 CONTD… COMPOSITIONAL FEATURE Dimers Triplets Tetramers 6 th World Congress on Biotechnology

12 Feature Selection 344 4 4 4 4 16+64+256 Total PositionalDependencyCompositional 4 4 4 4 14+15+12 6 th World Congress on Biotechnology

13 Feature Selection… Feature Type#FeaturesFeatures Positional4 Dependency4 Composition al 41 6 th World Congress on Biotechnology

14 Cross validation 1 2 3 4 5 TSS 1 2 3 4 5 FSS 1 2 3 4 1 2 3 4 55 Training Test Classifiers Prediction 6 th World Congress on Biotechnology

15 Parameter Optimization 6 th World Congress on Biotechnology

16 Performance measure 6 th World Congress on Biotechnology

17 Performance measure… Measure BalancedImbalanced HumanBovineFishWormHumanBovineFishWorm AUC-ROC 96.0596.9496.9596.2497.2197.4597.4198.06 AUC-PR 97.6497.8997.9197.9093.2493.3493.3892.29 6 th World Congress on Biotechnology

18 Comparative Analysis NN269#TSS#FSS Training1116208 Testing4140782 ApproachesAUC-ROCAUC-PR References MM1-SVM97.62 89.58 Baten et al., 2006 LIK-SVM98.0492.65 Sonnennburg et al., 2007 WD-SVM98.5092.86 WDS-SVM98.1392.47 EFFECT98.2092.81 Kamath et al., 2014 Proposed96.5393.54 6 th World Congress on Biotechnology

19 Prediction Server http://cabgrid.res.in:8080/hsplice HSplice 6 th World Congress on Biotechnology

20 ACKNOWLEDGEMENT DIRECTOR INDIAN AGRICULTURAL STATISTICS RESEARCH INSTITUTE NEW DELHI 6 th World Congress on Biotechnology


Download ppt "PRABINA KUMAR MEHER SCIENTIST DIVISION OF STATISTICAL GENETICS INDIAN AGRICULTURAL STATISTICS RESERARCH INSTITUTE INDIAN COUNCIL OF AGRICULTURAL RESEARCH."

Similar presentations


Ads by Google