Carnegie Institution for Science, Department of Plant Biology.

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us:
TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource
Bienvenidos a TAIR! Kate Dreher curator TAIR/PMN.
Extracting information from scientific papers: Challenges and Opportunities for Researchers and Curators DPB.
Annotation of Gene Function …and how thats useful to you.
TAIR: Bringing together data for the global plant biology community kate dreher curator TAIR/PMN.
The Arabidopsis Information Resource (TAIR)
Arabidopsis as a model for plant development Eva Huala.
Gene Structure Annotation Philippe Lamesch International Arabidopsis conference July 23, 2008, Montreal.
Kate Dreher AraCyc, TAIR, PMN Carnegie Institution for Science
Putting TAIR to work for you hands-on workshop for beginning and advanced users
El PMN: Tu amigo en el metabolismo de plantas Kate Dreher curator PMN/AraCyc/TAIR.
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
What is RefSeqGene?.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
The Plant Metabolic Network: PlantCyc, AraCyc, and NEW Metabolic Pathway Databases for Plant Research *K. Dreher, P. Zhang, L. Chae, R.A. Nilo Poyanco,
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
1.Generate mutants by mutagenesis of seeds Use a genetic background with lots of known polymorphisms compared to other genotypes. Availability of polymorphic.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BIOLOGY 3020 Fall 2008 Gene Hunting (DNA database searching)
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Accessing the Data You Need at the Plant Metabolic Network kate dreher biocurator PMN The Carnegie Institution for Science Stanford, CA.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
New data and tools at TAIR (The Arabidopsis Information Resource)
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Accessing information in plant metabolic pathway databases at the PMN, Gramene, and SGN Part I: Contents, Search Strategies, and Data Sharing Opportunities.
TAIR/Gramene/SGN Workshop I ASPB Meeting July 08, 2007 Chicago, IL Metabolic Databases.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
PlantCyc, AraCyc, PoplarCyc and more... Building databases and connecting to researchers at the Plant Metabolic Network kate dreher curator PMN/TAIR.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Metabolic Pathway Databases and Tools Speaker and Schedule Update PMN (Peifen Zhang) KEGG (auto-slide show) MetaCrop (cancelled)
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Copyright OpenHelix. No use or reproduction without express written consent1.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
Copyright OpenHelix. No use or reproduction without express written consent1.
Development and Use of Controlled Vocabularies at the Arabidopsis Information Resource (TAIR) Sue Rhee Carnegie Institution Dept. Plant Biology
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Jan. 13, 2011 B4730/5730 Plant Physiological Ecology Introduction to Physiology and Genetics.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
2006 ICAR: TAIR workshop Organizers: Katica Ilic and Peifen Zhang Location: Reception Room, 4th floor A general overview of TAIR website and demonstration.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
Part II SeqViewer AraCyc Help
Presentation transcript:

Carnegie Institution for Science, Department of Plant Biology

Putting TAIR to work for you: Tips and Techniques for Accessing Arabidopsis Data for Plant Biology Research Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science

Part I: Presentation (with exercises) Finding a specific gene of interest in TAIR Looking at the data on the locus, gene model, and protein pages Getting to know GBrowse Creating and enhancing customized data sets Tips for working with Arabidopsis Part II: Practice problems and individual help Hand-outs with practice problems to work on Questions from participants Individual help All documents are available in electronic form: Resource guide Questions, answers, and practice data Bienvenidos a TAIR presentacion y esta presentacion Overview

The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model plant Arabidopsis Curators and programmers at TAIR: Collect, store, and organize Arabidopsis data Attach functional information to genes Improve gene structures Provide tools to analyze data Work with the ABRC to provide seeds and clones What is TAIR?

Finding the gene you want Case 1: You have a non-Arabidopsis gene and want to find its homolog Case 2: You know exactly what Arabidopsis gene you want You know the AGI locus code (e.g. At2g46990) You know the gene symbol (e.g. PhyA) Tips and Techniques for Accessing Arabidopsis Data

Finding a gene: practice problems You are reading a paper about an interesting phenotype caused by a mutation in the AN gene. Find the AGI locus code of this gene You find an EST that is expressed at high levels in the seed of your Phaseolus vulgaris variety: GenBank: AB (To find gene in GenBank – google NCBI and you should find the page) Find the AGI locus codes of the top three hits in TAIR using BLAST Is it the same if you BLAST with the transcript or the protein? Based on the transcript Based on the protein

Finding a gene: practice problems You are reading a paper about an interesting phenotype caused by a mutation in the AN gene. Find the AGI locus code of this gene AT1G01510 (a.k.a. ANGUSTIFOLIA) You find an EST that is expressed at high levels in the seed of your Phaseolus vulgaris variety: GenBank: AB Find the AGI locus codes of the top three hits in TAIR using BLAST Is it the same if you BLAST with the transcript or the protein? Based on the transcript AT1G | Symbols: GAI, RGA2 | GAI (GIBBERELLIC ACID IN e-08 AT3G | Symbols: RGL2 | RGL2 (RGA-LIKE 2); transcript AT2G | Symbols: RGA1, RGA | RGA1 (REPRESSOR OF GA Based on the protein AT1G | Symbols: GAI, RGA2 | GAI (GIBBERELLIC ACID IN AT2G | Symbols: RGA1, RGA | RGA1 (REPRESSOR OF GA AT1G Symbols: RGL1, RGL | RGL1 (RGA-LIKE 1

Choosing the proper search result Locus Gene Model Protein

The Locus page: Lots of information

Looking at the Locus page: practice problems 1 Youre interested in learning more about a gene called: PMR2 (Powdery Mildew Resistant 2) What is its AGI locus code? How many splice variants does it have? Which one has the shorter coding region? What is another name for this gene? What is the evidence for it being involved in the defense response to fungus, incompatible interaction? How many total loci are annotated to this term? Which paper provides experimental evidence that PMR2 is located in the plasma membrane? What is the title of that paper?

Looking at the locus page: practice problems 1 Youre interested in learning more about a gene called PMR2: Powdery Mildew Resistant 2 What is its AGI locus code? At1g11310 How many splice variants does it have? 2 Which one has the shorter coding region? At1g What is another name for this gene? Mildew Resistant Locus 2 (MLO2) What is the evidence for it being involved in the defense response to fungus, incompatible interaction? Inferred from Mutant Phenotype; analysis of visible trait; Consonni 2005 How many total loci are annotated to this term? 44 Which paper provides experimental evidence that PMR2 is located in the plasma membrane? Benschop 2007 What is the title of that paper? Quantitative phospho-proteomics of early elicitor signalling in Arabidopsis.

Looking at the locus page: practice problems 2 Youre interested in learning more about a gene called: PMR2 (Powdery Mildew Resistant 2) How many cDNAs are associated with this locus? Which are available to order from the ABRC? What is the length of the full-length coding region? What is the isoelectric point of the protein? For the PERL polymorphism, what is the nucleotide difference between the Col and Bor-4 ecotypes?

Looking at the locus page: practice problems 2 Youre interested in learning more about a gene called: PMR2 (Powdery Mildew Resistant 2) How many cDNAs are associated with this locus? 3 Which are available to order? none What is the length of the full-length coding region? 1722 bp What is the isoelectric point of the protein? For the PERL polymorphism, what is the nucleotide difference between the Col-0 and Bor-4 ecotypes? Col

Looking at the locus page: practice problems 3 Youre interested in learning more about a gene called: PMR2 (Powdery Mildew Resistant 2) Does the pmr2-1 mutant form lesions in response to powdery mildew attack? What is the putative location of the T-DNA insertion in mlo2-6? What is the ecotype of SAIL_878_H12? How many publications are available for this gene for 2007? Which paper also mentions the PMR3 gene? How many papers mention the mlo2 allele/ mutant when you do a Textpresso search?

Looking at the locus page: practice problems 3 Youre interested in learning more about a gene called: PMR2 (Powdery Mildew Resistant 2) Does the pmr2-1 mutant form lesions in response to powdery mildew attack? no What is the putative location of the T-DNA insertion in mlo2-6? intron What is the ecotype of SAIL_878_H12? Col-0 How many articles and how many abstracts are available for this gene for 2007? 2 abstracts, 1 article Which paper also mentions the PMR3 gene? Isolation and characterization of powdery mildew-resistant Arabidopsis mutants PNAS 2000 How many papers mention the mlo2 allele/ mutant when you do a Textpresso search? 8

Locus page links: practice problems Youre interested in learning more about a gene called: PMR2 (Powdery Mildew Resistant 2) According to the Genevestigator Gene Atlas, which organ has the highest level of expression? According to the Genevestigator Response viewer, was the level of PMR2 transcript higher 1 hr or 4 hrs after treatment with the fungal elicitor FL22? According to the eFP site, are the absolute levels of PMR2 expression higher in the root or the shoot of a seedling, 6 hours after a cold treatment? In the SUBA database, where does the MS/MS data indicate that this protein is located? According to InParanoid, how many poplar genes fall into the same group? On the AT-TED II page, how many genes are directly linked to PMR2 by co- expression analysis, and which has the strongest correlation?

Locus page links: practice problems Youre interested in learning more about a gene called: PMR2 (Powdery Mildew Resistant 2) According to the Genevestigator Gene Atlas, which organ has the highest level of expression? senescent rosette leaf According to the Genevestigator Response viewer, was the level of PMR2 transcript higher 1 hr or 4 hrs after treatment with the fungal elicitor FL22? It is higher 1 hour after treatment According to the eFP site, are the absolute levels of PMR2 expression higher in the root or the shoot of a seedling, 6 hours after a cold treatment? They are higher in the root In the SUBA database, where does the MS/MS data indicate that this protein is located? plasma membrane According to InParanoid, how many poplar genes fall into the same group? 2 On the AT-TED II page, how many genes are directly linked to PMR2 by co- expression analysis, and which has the strongest correlation? 5, At2g44180 is the strongest

Do we need anything besides the locus, gene model, and protein pages?

How many Papaya genes are found in the same cluster as PMR2 in Phytozome? How many Vitis vinifera genes?

Basic navigation and tools in GBrowse Use controls to zoom and scroll along chromosome Get sequence Enter locus, marker, etc. ***Many tracks now contain data from the TAIR9 release on Monday, June 22

GBrowse = Gobs of Information x x

GBrowse: practice problems How many papaya homologs are displayed from Phytozome? And how many amino acids are in the putative ortholog that has the Mlo domain? There are two upstream regulatory regions located upstream of this gene? Which one has been linked to the a cis element in rice? Which of the following has a longer transcript assembly aligning with PMR2? Saccharum officinarum or Triticum aestivum? Solanum tuberosum or Vitis vinifera? Are there any experimentally supported phosphorylation sites? What polymorphism appears to occur in the 5 th intron? Is there peptide support for the third exon? the fourth exon? the fifth exon? And which gene model is supported by peptide evidence? Which exon structure seems to be better supported by the Brassica cDNA? by the Radish clones?

GBrowse: practice problems How many papaya homologs are displayed from Phytozome? And how many amino acids are in the putative ortholog that has the Mlo domain? 2; 350 amino acids There are two upstream regulatory regions located upstream of this gene? Which one has been linked to the a cis element in rice? AtREG417 Which of the following has a longer transcript assembly aligning with PMR2? Saccharum officinarum or Triticum aestivum? Triticum aestivum Solanum tuberosum or Vitis vinifera? Solanum tuberosum Are there any experimentally supported phosphorylation sites? Yes, from the motif: SVENYPSSPSPR What polymorphism appears to occur in the 5 th intron? PERL Is there peptide support for the third exon? the fourth exon? the fifth exon? And which gene model is supported by peptide evidence? third – yes; fourth – no, fifth – yes; the At1g model is supported Which exon structure seems to be better supported by the Brassica cDNA? by the Radish clones? the At1g model is better supported by both types of transcripts

Scientists often want to work with more than one gene or protein that are related through some common feature TAIR (and the PMN) offer some basic tools to create and/or enhance these customized data sets Sometimes, one gene isnt enough...

Data sets can be based on many different criteria: Overall sequence alignment (DNA or protein) Sequence motifs (DNA or protein) Protein domains and biochemical properties Gene/Protein function Subcellular location Molecular function Biological process Expression pattern Biochemical pathway Mapping region Phenotype Gene families Creating customized data sets How do you generate these data sets?

Creating data sets: practice problems How many DNA stocks are associated with NPR1? Do any of them that are available from the ABRC have full length cDNAs? How many keywords contain the term oxalate? How many of them have been used to annotate Arabidopsis genes? How many germplasms are associated with a reduced seed set phenotype? How many genes encode proteins that are found in the chloroplast stroma based on a direct assay? Try to get the calculated PIs for all the chloroplast stroma proteins and find the highest and lowest values. How many proteins have the following domain Gly-Arg-Ala-Asn-hydrophobic residue (GRAN[hydrophilic])?

Creating data sets: practice problems How many DNA stocks are associated with NPR1? Do any of them that are available from the ABRC have full length cDNAs? 11; yes, the two stocks available from the ABRC have full-length cDNAs How many keywords contain the term oxalate? How many of them have been used to annotate Arabidopsis genes? 11 keywords; two have been used for Arabidopsis How many germplasms are associated with a reduced seed set phenotype? 68 How many genes encode proteins that are found in the chloroplast stroma based on a direct assay? 396 loci Try to get the calculated PIs for all the chloroplast stroma proteins and find the highest and lowest values. 4.25, How many proteins have the following domain Gly-Arg-Ala-Asn-hydrophobic residue (GRAN[hydrophilic])? 32

Putting TAIR to work for you Use TAIR to find detailed information for a specific gene / protein Locus page, gene model page, protein page Many sections, many data types, many external links GBrowse Many tracks Use TAIR to create and enhance customized data sets Specific and Advanced Search pages Motif analysis tools FTP files with large data sets Use TAIR for data visualization and analysis GO categorization (TAIR) OMICs viewer (PMN) If youre having trouble getting any information you want from TAIR...

We are here to help: Please use our data Please use our tools Please use TAIR to help improve your research on IMPORTANT plants! Please contact us if we can be of any help! Make an appointment to meet with me during my visit (Puedo tratar de hablar en español)

Thank you! TAIR, AraCyc, and the PMN Current Curators: - Tanya Berardini (lead curator – functional annotation) - David Swarbreck (lead curator – structural annotation) - Peifen Zhang (Director and lead curator- metabolism) - A. S. Karthikeyan (curator) - Philippe Lamesch (curator) - Donghui Li (curator) - Rajkumar Sasidharan (curator) Recent Past Contributors: - Debbie Alexander (curator) - Christophe Tissier (curator) - Hartmut Foerster (curator) Tech Team Members: - Bob Muller (Manager) - Larry Ploetz (Sys. Administrator) - Raymond Chetty - Anjo Chi - Vanessa Kirkup - Cynthia Lee - Tom Meyer - Shanker Singh - Chris Wilks Metabolic Pathway Software: - Peter Karp and SRI group Eva Huala (Director and Co-PI) Sue Rhee (PI and Co-PI)

We are here to help: Please use our data Please use our tools Please use TAIR to help improve your research on IMPORTANT plants! Please contact us if we can be of any help! Make an appointment to meet with me during my visit (Puedo tratar de hablar en español)