Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs ENCODE Gene Prediction Workshop - EGASP/2005 Sarah Djebali,

Similar presentations


Presentation on theme: "Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs ENCODE Gene Prediction Workshop - EGASP/2005 Sarah Djebali,"— Presentation transcript:

1 Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs ENCODE Gene Prediction Workshop - EGASP/2005 Sarah Djebali, Franck Delaplace, Hugues Roest Crollius

2 Human experts generate reference gene annotations  automating human expertise could provide highly specific gene models What do human experts do?  Human experts combine biological objects using heuristic rules  Both biological objects and heuristic rules evolve with time Human experts generate high quality gene annotations

3 Exogean is a generic framework based on directed acyclic coloured multigraphs (DACMs) made to allow the integration of any set of heuristic rules to any set of resources In Exogean DACMs:  Nodes are biological objects (protein or mRNA alignments, …etc)  Multiple edges between nodes are relations between objects In terms of DACMs the human expert annotation protocol corresponds to building, reading and reducing DACMs Exogean: a highly flexible method that automates human expertise

4 Exogean main steps Filter CDS Identification Filter output Information Collection - Blat - Spidey - Blast … etc Single Molecule Clustering Single Type Multi Molecule Clustering Multi Type Multi Molecule Clustering Exogean core: DACM expert annotation Protein and mRNA alignments called HSPs Final gene models with multiple transcripts Reduction DACM1DACM2DACM3

5 h1h1 h2h2 h3h3 h4h4 rm 1 h5h5 h6h6 rm 2 h7h7 h8h8 h9h9 rm 3 h 10 h 11 h 12 h 13 pm 1 h 14 h 15 h 16 pm 2 h 17 h 18 pm 3 Example: several mRNAs and proteins have been aligned to a specific locus rm i = mRNA molecule pm j = protein molecule h k = mRNA or protein HSP

6 Building and reducing DACM1 = the Single Molecule Clustering

7 mRNA, protein HSPs DACM1 building + reduction Level3 transcript models DACM2 building + reduction DACM3 building + reduction Level2 transcript models Level1 transcript models DACM expert annotation Each DACM reduction produces more complexe transcript models

8 M2M2 M1M1 DACM3 reduction produces final transcript models M i = final multi type multi molecule transcript model in which Exogean searches for a CDS 13 1 1 1223 2232

9 Evaluation method Method_X HAVANA FN TP FN FP

10 Specificity on the 44 ENCODE regions

11 Sensitivity on the 44 ENCODE regions

12

13 Evaluation method  TP = True Positive : each HAVANA CDS matched exactly by at least one CDS from method_X is counted as TP  FP = False Positive : a virtual HAVANA CDS is defined as a method_X CDS that does not match exactly a HAVANA CDS and is counted as FP  FN = False Negative : each HAVANA CDS that is not matched exactly by at least one method_X CDS is counted as FN

14 h1h1 h2h2 h3h3 h4h4 r1r1 h5h5 h6h6 r2r2 h7h7 h8h8 h9h9 r3r3 h 10 h 11 p1p1 h 12 h 13 p2p2 h 14 h 15 h 16 p3p3 h 17 h 18 p4p4 DACM1 reduction produces level1 transcript models r i = mRNA level1 transcript model p j = protein level1 transcript model

15 Building and reducing DACM2 = the Single Type Multi Molecule Clustering

16 DACM2 reduction produces level2 transcript models R i = mRNA level2 transcript model P j = protein level2 transcript model R 1 (r 1,r 3 ) 22 111 R 2 (r 2,r 3 ) 2 1 11 P 2 (p 2 ) 11 P 3 (p 4 ) 11 P 1 (p 1,p 3 ) 2111

17 Building and reducing DACM3 = the Multi Type Multi Molecule Clustering


Download ppt "Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs ENCODE Gene Prediction Workshop - EGASP/2005 Sarah Djebali,"

Similar presentations


Ads by Google