Download presentation
Presentation is loading. Please wait.
Published byBrendan Henderson Modified over 9 years ago
1
UMR 1095 - ASP UMR 1095 - ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M. Reichstadt, P. Leroy Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France 3rd EGEE User Forum February 12th, 2008
2
Wheat as a challenge for Genomics Important Economic Crop Large Genome size Barley Rice Bread wheat 4.800 Mb 2.800 Mb 380 Mb Maize 85% Repeat sequences 70-80% 50-80% 50% 140 Mb 10% 17.000 Mb Human ~ 3.000 Mb A. thaliana
3
I.N.R.A. Work on the Wheat Genome Sequencing Annotating Discover Genes Find Transposable Elements Study other biological components AAAATCGATATAGAGTATGTAGACAAATTTTAAACCCGGGGGAGAGAGAGA DNA sequence Results after Annotation of the DNA Sequence
4
Eugene GenemarkHMM GeneID General Pipeline Structure of TriAnnot TriAnnot Pipeline GRID DataBase ( chado ) & Viewers ( GBrowse ) http://urgi.versailles.inra.fr/projects/TriAnnot/ TriSet GeneFarm Manualcuration training data set Genes Manualcuration TEs TREPcons REPET DNA sequences TEs Manualcuration
5
WEB / Pipeline Production GBrowse Login/password DataBanks WEB / Pipeline Development DownLoad gff/ARTEMIS gameXml/APOLLO Manual Curation APOLLO GnpDB On Line Login/password RepeatMasker, est2genome, Gmap, BLAST, HMMPfam UpLoad Login/password Local Gnp Genome GFF gff Users TriAnnotPipelineGRID Architecture GRID & Cluster
6
Transposable Element & repeats Panel 1 BAC sequence FASTA format BAC with masked TE Block1aBlock1b BLASTx / TREPprot TRF SSR RepeatMasker TREPnr, TREPtotal RepBase, Annotation Masking Other biological target searches Panel 3 … nt, sts, htgs, gss tRNA miRNA mtDNA cpDNA Block5b Block5c Block5d BLASTn UGset / IRGSP/ TIGR pseudo Block5a Panel 2 Gene annotation Gene Structure Prediction ab initio Prediction GeneMarkHMM, GeneID, EuGene, GENSCAN, GeneZilla BLASTx BLASTx SwissProt / TrEMBL BAC with masked TEs & Genes Block2 BLAST/Gmap BLAST/Gmap with transcripts FL-cDNA, EST, mRNA Block3a Block3b Gene Model EVM + PASA EVM + PASA (US) RAP-like RAP-like (Japan) EUGENE EUGENE (France) Block3c Known Protein Putative Protein Domain Containing Protein Expressed Gene Conserved Hypothetical Gene Hypothetical Gene Gene Function IWGSC annotation guide line Block4 Best Hit proteins - At - Os - At - Os Best Hit TriAnnotPipelineGRID Detailed Architecture
7
PIPELINE PART : WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xmlWheat Seq STEP_0:* 3 RepeatMasker vs 3 DataBanks STEP_1:* 8 BLASTn vs 8 DataBanks * 1 BLASTx vs 1 DataBank * 1 Tandem Repeat Finder STEP_2:* 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM with 4 matrix STEP_3:* 1 tBLASTx vs 1 DataBank * 1 BLASTn vs 1 DataBank * 1 BLASTx vs 1 DataBank STEP_4:* 2 tBLASTn vs 2 DataBank RESULTS FILES (GFF Format)
8
PIPELINE PART: WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xmlWheat Seq PIPELINE_GRID PART I (STEP_1A) PIPELINE LOCAL PART: STEP_1B: * 1 TRF STEP_2: * 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM STEP_3C:* 3 Gene Modelling PIPELINE_GRID PART II (STEP_1B, 3A, 3B, 4A, 4B, 5A et 5D) 5 RM3 BLASTx 8 GMap 6 BLASTp1 PFAM1 tBLASTn 14 BLASTn 5 RepeatMasker (RM) RESULTS FILES (GFF Format) TriAnnotPipelineGRID Architecture
9
Bioinformatic algorithms SE Bioinformatic databases Bioinformatic algorithms Bioinformatic package Server User Interface Server part Grid part DB update service Computing Element (CE) UI JDL
10
Bioinformatic algorithms CE UI Server Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Computing Element (CE) UI JDL
11
Bioinformatic algorithms CE 1-Parameters + input file 2-Creation XML file 9-DB filling 3-copy input files 4-Creation environment 6-job running (BLAST/ HMMer/RepeatMasker/GMap) 5-job submission 7- job output 8-output transfer UI JDL
12
2007-2008 TriAnnotPipelineGRID Partners F. Giacomoni C. Charpentier N. Guilhot F. Choulet P. Leroy C. Feuillet T. Tanaka H. Ikawa H. Numa T. Itoh M. Alaux T. Flutre I. Blanc-Lenfle S. Reboux H. Quesneville B. Haas F. Legeai B. Kronmiller M. Reichstadt A. Claude M. Liauzu A. Mahul
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.