Presentation is loading. Please wait.

Presentation is loading. Please wait.

UMR 1095 - ASP UMR 1095 - ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M.

Similar presentations


Presentation on theme: "UMR 1095 - ASP UMR 1095 - ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M."— Presentation transcript:

1 UMR 1095 - ASP UMR 1095 - ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M. Reichstadt, P. Leroy Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France 3rd EGEE User Forum February 12th, 2008

2 Wheat as a challenge for Genomics Important Economic Crop Large Genome size Barley Rice Bread wheat  4.800 Mb  2.800 Mb  380 Mb Maize  85% Repeat sequences 70-80% 50-80%  50%  140 Mb  10%  17.000 Mb Human ~ 3.000 Mb A. thaliana

3 I.N.R.A. Work on the Wheat Genome Sequencing Annotating  Discover Genes  Find Transposable Elements  Study other biological components AAAATCGATATAGAGTATGTAGACAAATTTTAAACCCGGGGGAGAGAGAGA DNA sequence Results after Annotation of the DNA Sequence

4 Eugene GenemarkHMM GeneID General Pipeline Structure of TriAnnot TriAnnot Pipeline GRID DataBase ( chado ) & Viewers ( GBrowse ) http://urgi.versailles.inra.fr/projects/TriAnnot/ TriSet GeneFarm Manualcuration training data set Genes Manualcuration TEs TREPcons REPET DNA sequences TEs Manualcuration

5 WEB / Pipeline Production GBrowse Login/password DataBanks WEB / Pipeline Development DownLoad gff/ARTEMIS gameXml/APOLLO Manual Curation APOLLO GnpDB On Line Login/password RepeatMasker, est2genome, Gmap, BLAST, HMMPfam UpLoad Login/password Local Gnp Genome GFF gff Users TriAnnotPipelineGRID Architecture GRID & Cluster

6 Transposable Element & repeats Panel 1 BAC sequence FASTA format BAC with masked TE Block1aBlock1b BLASTx / TREPprot TRF  SSR RepeatMasker TREPnr, TREPtotal RepBase, Annotation Masking Other biological target searches Panel 3 … nt, sts, htgs, gss tRNA miRNA mtDNA cpDNA Block5b Block5c Block5d BLASTn UGset / IRGSP/ TIGR pseudo Block5a Panel 2 Gene annotation Gene Structure Prediction ab initio Prediction GeneMarkHMM, GeneID, EuGene, GENSCAN, GeneZilla BLASTx BLASTx SwissProt / TrEMBL BAC with masked TEs & Genes Block2 BLAST/Gmap BLAST/Gmap with transcripts FL-cDNA, EST, mRNA Block3a Block3b Gene Model EVM + PASA EVM + PASA (US) RAP-like RAP-like (Japan) EUGENE EUGENE (France) Block3c Known Protein Putative Protein Domain Containing Protein Expressed Gene Conserved Hypothetical Gene Hypothetical Gene Gene Function IWGSC annotation guide line Block4 Best Hit proteins - At - Os - At - Os Best Hit TriAnnotPipelineGRID Detailed Architecture

7 PIPELINE PART : WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xmlWheat Seq STEP_0:* 3 RepeatMasker vs 3 DataBanks STEP_1:* 8 BLASTn vs 8 DataBanks * 1 BLASTx vs 1 DataBank * 1 Tandem Repeat Finder STEP_2:* 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM with 4 matrix STEP_3:* 1 tBLASTx vs 1 DataBank * 1 BLASTn vs 1 DataBank * 1 BLASTx vs 1 DataBank STEP_4:* 2 tBLASTn vs 2 DataBank RESULTS FILES (GFF Format)

8 PIPELINE PART: WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xmlWheat Seq PIPELINE_GRID PART I (STEP_1A) PIPELINE LOCAL PART: STEP_1B: * 1 TRF STEP_2: * 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM STEP_3C:* 3 Gene Modelling PIPELINE_GRID PART II (STEP_1B, 3A, 3B, 4A, 4B, 5A et 5D) 5 RM3 BLASTx 8 GMap 6 BLASTp1 PFAM1 tBLASTn 14 BLASTn 5 RepeatMasker (RM) RESULTS FILES (GFF Format) TriAnnotPipelineGRID Architecture

9 Bioinformatic algorithms SE Bioinformatic databases Bioinformatic algorithms Bioinformatic package Server User Interface Server part Grid part DB update service Computing Element (CE) UI JDL

10 Bioinformatic algorithms CE UI Server Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Computing Element (CE) UI JDL

11 Bioinformatic algorithms CE 1-Parameters + input file 2-Creation XML file 9-DB filling 3-copy input files 4-Creation environment 6-job running (BLAST/ HMMer/RepeatMasker/GMap) 5-job submission 7- job output 8-output transfer UI JDL

12 2007-2008 TriAnnotPipelineGRID Partners F. Giacomoni C. Charpentier N. Guilhot F. Choulet P. Leroy C. Feuillet T. Tanaka H. Ikawa H. Numa T. Itoh M. Alaux T. Flutre I. Blanc-Lenfle S. Reboux H. Quesneville B. Haas F. Legeai B. Kronmiller M. Reichstadt A. Claude M. Liauzu A. Mahul


Download ppt "UMR 1095 - ASP UMR 1095 - ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M."

Similar presentations


Ads by Google