Presentation is loading. Please wait.

Presentation is loading. Please wait.

TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource www.arabidopsis.org.

Similar presentations


Presentation on theme: "TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource www.arabidopsis.org."— Presentation transcript:

1 TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource www.arabidopsis.org contact us: curator@arabidopsis.org

2 o Philippe Lamesch Introducing TAIR and PMN TAIR10 genome annotation TAIR gene confidence ranking TAIR tools o Kate Dreher Ee Rr Outline

3 TAIR: The Arabidopsis Information Resource collect, curate and distribute information on Arabidopsis information freely available from arabidopsis.org

4 Slides available from TAIR www.arabidopsis.org

5 TAIR is used worldwide Visits per month (source: Google Analytics)

6 TAIR usage worldwide : July 2009-July 2010

7 What TAIR does: (1) Arabidopsis genome annotation

8 What TAIR does: (2) manual literature curation Controlled vocabulary annotations Gene Ontology (GO) http://www.geneontology.org/ Plant Ontology (PO) http://www.plantontology.org/ Gene name, symbol Allele, phenotype Summary statement composition

9 Who we partner with: PMN (Plant Metabolic Network) and PlantCyc A comprehensive plant biochemical pathway database, containing curated information from the literature and computational analyses about the genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism

10 Who we partner with: ABRC Distribution of biological research materials

11 A new approach for improving the Arabidopsis genome annotation for TAIR10 The Arabidopsis gene structure confidence ranking Arabidopsis genome annotation

12 Arabidopsis genome sequenced almost 10 years ago High quality sequence with few gaps TIGR did initial genome annotation TAIR took over responsibility in 2005 Current TAIR9 stats: 27,379 protein coding genes 4827 pseudogenes or transposable elements 1312 ncRNAs

13 Genome annotation at TAIR Add novel genes Update exon/intron structures of existing genes Delete mispredicted genes Merge and split genes Change gene types Add splice-variants

14 Genome annotation at TAIR Annotate atypical gene classes * * * ** * * Trans. element Short protein-coding genes Transposable element genes Pseudogenes uORFs (genes within UTR of other genes) Add novel genes Update exon/intron structures of existing genes Delete mispredicted genes Merge and split genes Change gene types Add splice-variants

15 Arabidopsis gene structure annotation A new approach TAIR6-TAIR9: Use ESTs and cDNAs and a assembly tool called PASA to improve gene structures TAIR10 TAIR10: Use new experimental data and new prediction tools to further improve gene structure predictions

16 Using PASA and ESTs/cDNAs Clustered transcripts NCBI Genome annotation TAIR6-TAIR9

17 Clustered transcripts Resulting gene model NCBI Using PASA and ESTs/cDNAs Genome annotation TAIR6-TAIR9

18 Clustered transcripts Resulting gene model Previous gene model NCBI comparison Novel genes New Splice-variants Gene structure updates Using PASA and ESTs/cDNAs Genome annotation TAIR6-TAIR9

19 ESTs cDNAs Radish sequence alignments Eugene prediction dicot sequence alignments monocot sequence alignments Aceview gene predictions 2 gene isoforms Manual annotation at TAIR: Apollo Short MS peptide

20 TAIR10: using proteomics and RNA-seq data to improve genome annotation 4-step process: 1.Mapping RNA seq & Peptides 2.Assembly/Gene built 3.Manual review 4.Integration (genome release/Gbrowse)

21 Mapping and Assembly 1.Mapping RNA-seq sequences (Tophat (C. Trapnell), Supersplat (T.C. Mockler)) Peptides (6-frame translation, spliced exon graph) 2.Assembly approaches Augustus (M. Stanke) o Uses spliced RNA seq reads, peptides o Aim: Identify additional splice-variants, update existing genes TAU (T.C. Mockler) o Uses spliced RNA seq reads o Aim: Identify additional splice-variants Cufflinks (C. Trapnell) o Uses spliced and unspliced RNA seq data o Aim: Identify novel genes

22 Augustus TopHat, SuperSplat 145,000 RNA-seq junctions based on >1 read 203,000 clustered spliced RNA-seq junctions (spliced RNA-seq junction) RNA-seq datasets (Mockler Lab, Ecker Lab) 200 Million aligned RNA-seq reads

23 Augustus 145,000 RNA-seq junctions based on >1 read 260,000 peptides (Baerenfaller et al, Castellana et al) Augustus gene prediction + ESTs & cDNAs + AGI models 11% of RNA-seq junctions incorporated into Augustus models 64% of peptide sequences incorporated into Augustus models Predicted Augustus models: 5461 distinct models 1596 novel models

24 Categorisation/Review TAU Models RNA-seq Junctions Augustus Model TAIR confidence rank TAIR Model Peptides (Splice variants, NMD targets) (correction) (colour reflects matching model) Incorrect junction in TAIR model Unsupported exon

25 Example Augustus update

26 Example Augustus splice variant

27 Example 2 August splice variant

28 Augustus/TAU/Cufflinks Augustus Incorporate 64% of peptides not contained in TAIR, 11 % for RNA-seq junctions 5461 potential updated genes 1596 potential novel genes TAU 30,083 junctions distinct to Augustus or TAIR models 10,902 junctions incorporated into 10,491 TAU models Cufflinks 367 novel assemblies which fall above the 100 bp #TE-filter applied to AUG and cufflinks models

29 Preliminary TAIR 10 Results Novel genes Updated genes Splice-variants B-list Rejects

30 Preliminary TAIR 10 Results Novel genes 126 Updated genes 1182 Splice-variants 5885 (18% of all loci) B-list 1586 Rejects 2318

31 Gene Confidence Rank Attributes confidence scores to all exons and gene models based on different types of experimental and computational evidence

32 Assigning A Confidence Rank E1 E4

33 Full support No support

34 New and updated tools at TAIR N-Browse GBrowse Synteny viewer

35 N-Browse (in collaboration wit the Kris Gunsalus Lab, NYU) > 7,000 experimental interactions Interactions curated by TAIR, IntAct & BioGrid Tutorial at http://www.arabidopsis.org/tools/nbrowse.jsp#nb-tut New and updated tools at TAIR

36 N-Browse

37 N-Browse: Finding information about edges (interactions)

38 N-Browse: How to select and move nodes

39 N-Browse: How to visualize GO terms from a selected set of nodes

40 N-Browse: How to load your own file and overlay it with the curated interaction data

41 N-Browse: How to save your session and export your data

42 New Tools at TAIR N-Browse GBrowse Synteny viewer

43 GBrowse Header Main Browser Window Track Menu

44 Alternative gene annotations Eugene (transcript, proteins +) Thierry-Mieg (NCBI) Gnomon (transcript, proteins) Souvorov (NCBI) Aceview (transcript) Sebastien Aubourg Hanada et al 2007 (3633 predicted genes)

45 Proteomic Data High-density Arabidopsis proteome map (Baerenfaller. 2008) Incorrect start codon

46 VISTA plot Gbrowse track

47 Transcriptome data

48 Orthologs and Gene Families

49 Variation

50 Promoter Elements

51 Methylation

52 Decorated Fasta file

53

54

55 New Tools at TAIR N-Browse GBrowse Synteny viewer Data provided by Pedro Pattyn at the University of Ghent

56 AT5G48000 AT5G48010 AT5G47990

57

58 www.arabidopsis.org curator@arabidopsis.org www.plantcyc.org curator@plantcyc.org

59

60 Example 2 Augustus update

61 GBrowse Header Main Browser Window Track Menu

62 Gbrowse


Download ppt "TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource www.arabidopsis.org."

Similar presentations


Ads by Google