Download presentation
Presentation is loading. Please wait.
Published byMoses Cox Modified over 9 years ago
1
Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs ENCODE Gene Prediction Workshop - EGASP/2005 Sarah Djebali, Franck Delaplace, Hugues Roest Crollius
2
Human experts generate reference gene annotations automating human expertise could provide highly specific gene models What do human experts do? Human experts combine biological objects using heuristic rules Both biological objects and heuristic rules evolve with time Human experts generate high quality gene annotations
3
Exogean is a generic framework based on directed acyclic coloured multigraphs (DACMs) made to allow the integration of any set of heuristic rules to any set of resources In Exogean DACMs: Nodes are biological objects (protein or mRNA alignments, …etc) Multiple edges between nodes are relations between objects In terms of DACMs the human expert annotation protocol corresponds to building, reading and reducing DACMs Exogean: a highly flexible method that automates human expertise
4
Exogean main steps Filter CDS Identification Filter output Information Collection - Blat - Spidey - Blast … etc Single Molecule Clustering Single Type Multi Molecule Clustering Multi Type Multi Molecule Clustering Exogean core: DACM expert annotation Protein and mRNA alignments called HSPs Final gene models with multiple transcripts Reduction DACM1DACM2DACM3
5
h1h1 h2h2 h3h3 h4h4 rm 1 h5h5 h6h6 rm 2 h7h7 h8h8 h9h9 rm 3 h 10 h 11 h 12 h 13 pm 1 h 14 h 15 h 16 pm 2 h 17 h 18 pm 3 Example: several mRNAs and proteins have been aligned to a specific locus rm i = mRNA molecule pm j = protein molecule h k = mRNA or protein HSP
6
Building and reducing DACM1 = the Single Molecule Clustering
7
mRNA, protein HSPs DACM1 building + reduction Level3 transcript models DACM2 building + reduction DACM3 building + reduction Level2 transcript models Level1 transcript models DACM expert annotation Each DACM reduction produces more complexe transcript models
8
M2M2 M1M1 DACM3 reduction produces final transcript models M i = final multi type multi molecule transcript model in which Exogean searches for a CDS 13 1 1 1223 2232
9
Evaluation method Method_X HAVANA FN TP FN FP
10
Specificity on the 44 ENCODE regions
11
Sensitivity on the 44 ENCODE regions
13
Evaluation method TP = True Positive : each HAVANA CDS matched exactly by at least one CDS from method_X is counted as TP FP = False Positive : a virtual HAVANA CDS is defined as a method_X CDS that does not match exactly a HAVANA CDS and is counted as FP FN = False Negative : each HAVANA CDS that is not matched exactly by at least one method_X CDS is counted as FN
14
h1h1 h2h2 h3h3 h4h4 r1r1 h5h5 h6h6 r2r2 h7h7 h8h8 h9h9 r3r3 h 10 h 11 p1p1 h 12 h 13 p2p2 h 14 h 15 h 16 p3p3 h 17 h 18 p4p4 DACM1 reduction produces level1 transcript models r i = mRNA level1 transcript model p j = protein level1 transcript model
15
Building and reducing DACM2 = the Single Type Multi Molecule Clustering
16
DACM2 reduction produces level2 transcript models R i = mRNA level2 transcript model P j = protein level2 transcript model R 1 (r 1,r 3 ) 22 111 R 2 (r 2,r 3 ) 2 1 11 P 2 (p 2 ) 11 P 3 (p 4 ) 11 P 1 (p 1,p 3 ) 2111
17
Building and reducing DACM3 = the Multi Type Multi Molecule Clustering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.