Bienvenidos a TAIR! Kate Dreher curator TAIR/PMN.

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

Carnegie Institution for Science, Department of Plant Biology.
Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us:
Kate Dreher curator TAIR/PMN Department of Plant Biology
TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource
Extracting information from scientific papers: Challenges and Opportunities for Researchers and Curators DPB.
Annotation of Gene Function …and how thats useful to you.
TAIR: Bringing together data for the global plant biology community kate dreher curator TAIR/PMN.
The Arabidopsis Information Resource (TAIR)
Arabidopsis as a model for plant development Eva Huala.
Gene Structure Annotation Philippe Lamesch International Arabidopsis conference July 23, 2008, Montreal.
Kate Dreher AraCyc, TAIR, PMN Carnegie Institution for Science
Putting TAIR to work for you hands-on workshop for beginning and advanced users
El PMN: Tu amigo en el metabolismo de plantas Kate Dreher curator PMN/AraCyc/TAIR.
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Key Area : Genetic Control of Metabolism in Micro-organisms Unit 2: Metabolism and Survival.
SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V What are Tandem repeats? - How does one find.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
1.Generate mutants by mutagenesis of seeds Use a genetic background with lots of known polymorphisms compared to other genotypes. Availability of polymorphic.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Accessing the Data You Need at the Plant Metabolic Network kate dreher biocurator PMN The Carnegie Institution for Science Stanford, CA.
Fine Structure and Analysis of Eukaryotic Genes
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Ethylene responses Developmental processes
New data and tools at TAIR (The Arabidopsis Information Resource)
Introduction to Arabidopsis Research
Accessing information in plant metabolic pathway databases at the PMN, Gramene, and SGN Part I: Contents, Search Strategies, and Data Sharing Opportunities.
Arabidopsis: The Model Organism Melissa Borkenhagen Heather Hernandez.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Why do we need good quality annotations? Pankaj Jaiswal Oregon State University Gene Annotation Workshop July 31, 2010 ASPB Plant Biology 2010 Montreal,
Copyright OpenHelix. No use or reproduction without express written consent1.
PlantCyc, AraCyc, PoplarCyc and more... Building databases and connecting to researchers at the Plant Metabolic Network kate dreher curator PMN/TAIR.
A Comparative Genomic Mapping Resource for Grains.
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Metabolic Pathway Databases and Tools Speaker and Schedule Update PMN (Peifen Zhang) KEGG (auto-slide show) MetaCrop (cancelled)
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Copyright OpenHelix. No use or reproduction without express written consent1.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Development and Use of Controlled Vocabularies at the Arabidopsis Information Resource (TAIR) Sue Rhee Carnegie Institution Dept. Plant Biology
Jan. 13, 2011 B4730/5730 Plant Physiological Ecology Introduction to Physiology and Genetics.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
2006 ICAR: TAIR workshop Organizers: Katica Ilic and Peifen Zhang Location: Reception Room, 4th floor A general overview of TAIR website and demonstration.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
BIOL 433 Plant Genetics Term 2,
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Genomes and Their Evolution
Arabidopsis: The Model Organism
Ensembl Genome Repository.
Metabolism and Survival
Part II SeqViewer AraCyc Help
Presentation transcript:

Bienvenidos a TAIR! Kate Dreher curator TAIR/PMN

TAIR = The Arabidopsis Information Resource Why Arabidopsis? What does TAIR do? What can you do with TAIR? Introduction to TAIR Arabidopsis

Introduction to Arabidopsis Basic facts: small weed related to mustard also known as mouse ear cress can grow to cm tall annual (or occasionally biennial) plant member of the Brassicaceae broccoli cauliflower radish cabbage found around the northern hemisphere Why do so many people study THIS plant?

Arabidopsis has good model organism traits Fast life cycle (6 weeks) Thousands of plants fit in a small space Fairly easy to grow Thousands of seeds produced by each plant Self-fertile (in-breeding) Many different subspecies/ecotypes Serves as a good model for crop plants But why Arabidopsis instead of other plants?

Arabidopsis offers some advantages Good genome very small: 125 Mb diploid 5 haploid chromosomes fewer/smaller regions of repetitive DNA than many plants Quite easily transformable with Agrobacterium NO tissue culture required Inertia! A group of scientists lobbied for Arabidopsis The genome was sequenced (2000) MANY resources have been developed

Arabidopsis research can be successfully applied to real plants Over-expression of the hardy gene from Arabidopsis can improve water use efficiency in rice (Karaba 2007) cDNAs from castor bean were over-expressed in Arabidopsis and a high- throughput screen of fatty acid content in Arabidopsis seeds led to the identification of three cDNAs that increase the hydroxy fatty acid levels (Lu 2006) Endosperm-specific over-expression of the Arabidopsis GTPCHI and ADCS biosynthetic genes can increase folate (vitamin B9) levels by up to 100-fold in rice (Storozhenko 2007) Studies on a sodium transporter (HKT1) in Arabidopsis helped to identify a durum wheat homolog. It has been introgressed into bread wheat lines and appears to improve their yield on saline soils (Hwang 2006; Byrt 2007, et al) Both basic and translational experiments using Arabidopsis continue...

Arabidopsis data explosion TONS of data are generated about Arabidopsis Over 2400 Arabidopsis articles published each year are indexed in PubMed Tens of thousands of mutants have been generated Hundreds of microarray experiments have been performed Proteomics and metabolomics studies are becoming popular 1001 Arabidopsis genomes are being sequenced Large-scale phenotypic studies are scheduled to start soon TAIR tries to bring data together to benefit scientists and society That includes all of you...

What does TAIR do? Curators and computer tech team members work together under great directors TAIR develops internal data sets and resources TAIR links to external data sets and resources TAIR provides free on-line access to everyone: Funded by the National Science Foundation of the USA Started in 1999 Dr. Eva Huala Director Dr. Sue Rhee Co-PI Curators Computer tech team members

Internal TAIR data sets Structural curators try to correctly define gene sequences Functional curators try to correctly describe gene function

Structural curation at TAIR Structural curators try to answer the question: What are ALL of the genes in Arabidopsis? Use many types of data ESTs full-length cDNAs peptides orthology Determine gene coordinates and features Establish intron, exon, and UTR boundaries Add alternative splice variants Classify genes protein coding miRNA psuedogene Keep updating! (even though the genome was sequenced in 2000!) TAIR9 – released June new loci and 739 new gene models

Structural curation at TAIR Apollo is a program to assist with structural curation ESTs Protein similarity cDNAs

The seed-bearing structure in angiosperms, formed from the ovary after flowering Functional curation at TAIR Functional curators try to answer the questions: What does every gene/protein in Arabidopsis do? When are where does it act? Functional curation requires controlled vocabularies Allow cross-species comparisons TAIR curators work to develop and agree upon common terms achene berry capsule caryopsis circumcissile capsule cypsela drupe follicle grain kernel legume loculicidal capsule lomentum nut pod pome poricidal capsule schizocarp septicidal capsule septifragal capsule silique FRUIT Plant Ontology: Structure: PO:

Catalysis of the reaction: auxin + UDP-D-glucose = indole-3-acetyl-beta-1-D-glucose + UDP Functional curation at TAIR Functional curators try to correctly describe gene function Functional curators try to help build controlled vocabularies Allow cross-species comparisons Develop and agree upon common terms indole-3-acetate beta-glucosyltransferase activity Gene Ontology: Molecular function: GO: IAA-Glu synthetase activity IAA-glucose synthase activity IAGlu synthase activity indol-3-ylacetylglucose synthase activity UDP-glucose:(indol-3-yl)acetate beta-D-glucosyltransferase activity UDP-glucose:indol-3-ylacetate glucosyl-transferase activity UDP-glucose:indol-3-ylacetate glucosyltransferase activity UDPG-indol-3-ylacetyl glucosyl transferase activity UDPglucose:indole-3-acetate beta-D-glucosyltransferase activity uridine diphosphoglucose-indoleacetate glucosyltransferase activity

Functional curation at TAIR Functional curators use controlled vocabularies to annotate genes Molecular function Subcellular localization Biological process Expression pattern Development stage Tissue / organ / cell type Gene Enter common name, e.g. Nitrate Transporter 2.7, NRT2.7 Prefer to track using AGI (Arabidopsis Genome Initiative) Locus Codes AT5G14570 Data Sources Published Literature Researchers Arabidopsis thaliana Chromosome 5 Position along chromosome (between and 14580) Gene

Functional curation at TAIR Functional curators capture mutant phenotypes alx8 mutant – mutation in gene At5g63980

External data sets MANY different external data sets are linked to specific genes EST sequences (Arabidopsis and other species) Transcript expression data Peptide expression data Biochemical pathway data (... described in the PMN talk) Epigenetic features Ecotype-specific polymorphisms Publications Seed stocks DNA vectors Interaction partners Promoter elements Post-translational modifications Orthologs New data types are frequently added

Providing Tools at TAIR Tech (computer) team members and curators Provide links to external databases from every gene page

Providing Tools at TAIR Tech (computer) team members and curators Load TAIR and external data sets into existing tools BLAST GBrowse Synteny Viewer (very new) NBrowse Interaction Viewer (coming soon...) Genbank Green Plant

Providing Tools at TAIR Tech (computer) team members and curators Develop new tools and modify existing tools SeqViewer Patmatch... several others

Providing Tools at TAIR Tech (computer) team members and curators Create advanced search pages

Other Resources at TAIR Ordering system for the Arabidopsis Biological Resource Center (ABRC) DNA stocks Seed stocks Community member information Arabidopsis lab protocols Gene Symbol Registry Information Portals

Are these data and tools useful? TAIR ITAIR IITAIR ITAIR II TAIR ITAIR II Visits per Month Unique Visitors per Month Bytes per Month

Who uses TAIR? (June 4 – July 4, 2009)

Why might you use TAIR? Do you work with plants? Do you want to take advantage of the tremendous amount of Arabidopsis data? Do you want to know more about a gene? an enzyme? a protein domain? a DNA regulatory region? an abnormal phenotype? a chromosomal region? a set of orthologous proteins? a biological process? natural variation across populations? Then please come see if TAIR can help you

Putting TAIR to work for you... You are studying drought tolerance in potato plants You do a subtractive hybridization study to identify cDNAs that are up- regulated in the roots of drought-stressed plants You find that a number of the up-regulated cDNAs code for proteins with a new domain: Ser-x-Glu-x-Cys-x-Ala = (SxExCxA) One of the family members, SECA1, appears to be present at particularly high levels How can TAIR help?

Putting TAIR to work for you... Are there any proteins with the SxExCxA domain in Arabidopsis? What do they do in Arabidopsis? Do they share additional domains? What is the closest homolog to SECA1? Are there any phenotypes when SECA1 is mutated? Can I get a cDNA of this homolog to over-express in my species? Are there putative SECA1 orthologs in other plant species?

Are there any SxExCxA proteins in Arabidopsis? Find all of the proteins that have the SxExCxA domain

What do the SxExCxA proteins do? Option 1: Get the individual GO annotations for each gene

What do SxExCxA proteins do? Option 2: Get an overview of the information for the set of genes

GO categorization

What do SxExCxA proteins do? Option 3: Get the description of each gene

What other domains do SxExCxA proteins share? Identify all of the other domains found in those proteins

What is the closest homolog to SECA1? Blast SECA1 against the TAIR9 protein data set

Are there any mutant phenotypes associated with At2g04240? Use the Seed/Germplasm Search page... or look in the Germplasm section of the Locus page

Can I get a cDNA for At2g04240 to overexpress in potato? Use the DNA Clones Search page

Are there putative SECA1 orthologs in other plant species? Look for putative orthologs and paralogs using GBrowse Phytozome (orthologs) InParanoid (paralogs)

We are here to help: Please use our data Please use our tools Please use TAIR to help improve your research on IMPORTANT plants! Please contact us if we can be of any help! Make an appointment to meet with me during my visit (Puedo tratar de hablar en español)

Acknowledgements TAIR Current Curators: - Tanya Berardini (lead curator – functional annotation) - David Swarbreck (lead curator – structural annotation) - Peifen Zhang (Director and lead curator- metabolism) - A. S. Karthikeyan (curator) - Philippe Lamesch (curator) - Donghui Li (curator) - Rajkumar Sasidharan (curator) Recent Past Contributors: - Debbie Alexander (curator) Eva Huala (Director) Sue Rhee (Co-PI) Tech Team Members: - Bob Muller (Manager) - Larry Ploetz (Sys. Administrator) - Raymond Chetty - Anjo Chi - Vanessa Kirkup - Cynthia Lee - Tom Meyer - Shanker Singh - Chris Wilks

We are here to help: Please use our data Please use our tools Please use TAIR to help improve your research on IMPORTANT plants! Please contact us if we can be of any help! Make an appointment to meet with me during my visit (Puedo tratar de hablar en español)

Why Arabidopsis? Plant research can benefit from focusing on a model plant Other model organisms include: Model organisms are easy to do experiments on Fast life cycle Dont need much space Easy to take care of Lots of offspring (for genetics) Can be genetically transformed Good model for the really interesting species humans CROP PLANTS Communities develop to study model organisms Many resources become available for model organisms Lab protocols Mutant maps Stock centers Genome sequences... and more! roundworm yeast mouse fruit fly zebrafish What should be the model plant?

Have I been able to get useful information at TAIR? We hope so! But, if you have any trouble finding the information you want...