TreeGenes & Tripal treegenesdb.org Emily Grau

Slides:



Advertisements
Similar presentations
How to use GDR, the Genome Database for Rosaceae Sook Jung, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Anna Blenda, Jing Yu, Sushan Ru, Kate Evans, Cameron.
Advertisements

GDR, the Genome Database for Rosaceae, in Chado and Tripal Sook Jung, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Anna Blenda, Sushan Ru, Ping Zheng,
Integrating Genome and Transcriptome Resources into TreeGenes Jill Wegrzyn David Neale Doreen Main Keithanne Mockaitis.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Map Curation on GrainGenes Victoria Carollo, Gerard Lazo, David Matthews, Olin Anderson Biological Databases Curators Meeting October 2003.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Abstract The goal of the Conifer Translational Genomics Network (CTGN) project is to provide tree breeders across the United States with new tools to enhance.
Ten years of GDR Current Resources and Functionality S Jung, T Lee, S Ficklin, CH Cheng, P Zheng, A Blenda, S Ru, K Evans, C Peace, N Oraguzie, AG Abbott,
New Data and Functionality of GDR, the Genome Database for Rosaceae Sook Jung, Taein Lee, Stephen Ficklin, Chun-Huai Cheng, Ping Zheng, Anna Blenda, Sushan.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Introductory Overview
GenSAS: Genome Sequence Annotation Server, a Tool for Online Annotation and Curation Dorrie Main, Taein Lee, Ping Zheng, Sook Jung, Stephen P. Ficklin,
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Dorrie Main, Jing Yu, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Ping Zheng, Taein Lee, Richard Percy and Don Jones.
Building Database Resources For Translational Research in Rosaceae Sook Jung, Taein Lee, Stephen Ficklin, Chun-Huai Cheng, Anna Blenda, Sushan Ru, Ping.
Comparative Genomics Tools in GMOD GMOD.org Dave Clements 1, Sheldon McKay 2, Ken Youns-Clark 2, Ben Faga 3, Scott Cain 4, and the GMOD Consortium 1 National.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Jing Yu, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Ping Zheng, Taein Lee, Richard Percy, Don Jones, Dorrie Main.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Diversity Bioinformatics Terry Casstevens Institute for Genomic Diversity, Cornell University GMOD Meeting at NESCent Durham, NC – June 29-30, 2006.
GDR in Drupal facilitating community building and efficient maintenance.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Digesting the Genome Glut Promoting the Use and Extension of GMOD To Emerging Model Organisms David Clements 1 Brian Osborne 2 Hilmar Lapp 1 Xianhua Liu.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Overview PlantCollections – Publish information about public garden collections – Using existing infrastructure Morphbank – Goals and capabilities of.
Updates to the Cool Season Food Legume Genome Database Dorrie Main, Chun-Huai Cheng, Rebecca McGee, Clarice Coyne, Stephen Ficklin, Taein Lee, Sook Jung,
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
The role of the National Agricultural Library in arthropod genomics research - implementing and developing tools for genomic data management Monica Poelchau.
A Tripal based Arthropod genome portal The i5k A Tripal based Arthropod genome portal Christopher Childers USDA/ARS/NAL i5k.nal.usda.gov.
Welcome to the combined BLAST and Genome Browser Tutorial.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Progress on TripalBIMS Breeding Information Management System in Tripal Sook Jung, Taein Lee, Chun-Huai Chen, Jing Yu, Ksenija Gasic, Todd Campbell, Kate.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
The i5k – enabling genomic data access, visualization and curation for the i5k community Monica Poelchau and the i5k group.
Software & Technologies: an overview
5/12/2018 Genome Database for Rosaceae New Data and New Functionality
Behavior and Phenotype in GMOD Natural Diversity in GMOD
Customized cloud platform for computing on your terms !
BIBFLOW Project Update
Practices of Science Data Sharing Platform for Bioinformation in SCBIT
UNC Digital Library Project
CottonGen: An Up-to-Date Resource Enabling Genetics, Genomics and Breeding Research for Crop Improvement Plant and Animal Genome Conference XXV Jing Yu1,
The Cool Season Food Legume Database: An Integrated Resource for Basic, Translational and Applied Research Dorrie Main, Chun-Huai Cheng, Stephen Ficklin,
SRA Submission Pipeline
the Genome Database for Rosaceae: New Data and Functionality
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
TAMU Bovine QTL db and viewer
for the Cotton Community
Updates and Future Direction
Genome Database for Rosaceae:
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Yating Liu July 2018 G-OnRamp workshop
How to Effectively Search and Download Data in CottonGen
CottonGen: Enabling Cotton Research through Big-Data Analysis and Integration Jing Yu, Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Ping Zheng,
New Data and Functionality in NRSP10 Databases
2016 Beltwide Cotton Conference
Germplasm Overview Page Trait Descriptor Standard Images
For more information contact:
Presentation transcript:

TreeGenes & Tripal treegenesdb.org Emily Grau -Total time alloted: 15 minutes (10-12 minutes talking time? Also the last presentation, so time might be squeezed from other talks) -How much background detail to give? Outline What TG is: History Userbase? Data we hold: # of species—many! Data types Data sources: Primary databases Users! Tools we offer GMOD Custom development: DiversiTree and CartograTree Transition to Chado & Tripal: Challenges: 25 years of data in custom schema  chado Advantages: Future stuff: Tripal Gateway will allow us to expand resources we offer to users! Connect HWG, GDR, TG Workflows + HPC resources (will we tie this in to CartograTree, give users greater control, offer multistep workflows?) New user submission module (!!!!!), plan to develop module for others to use(?) treegenesdb.org Emily Grau Department of Ecology & Evolutionary Biology University of Connecticut, Storrs CT

TreeGenes Database: History treegenesdb.org Began as the Dendrome project (USDA funded initiative) in 1993 to hold forest tree genetic maps and associated markers One of the 1st UDSA funded databases on the internet Schema has been changing and evolving ever since Start of the history slide Number of users Organizations Countries represented Awstats or Google History Overview yr started comparative maps expanded to seq (sanger) -> next gen -> association (environment)

TreeGenes Database: History treegenesdb.org Began to hold forest tree genetic maps and associated markers Expanded to other data types Sequence Reseqeuncing, Large-Scale Genotyping, Transcriptomics/Expression Full Genome Sequences Analysis and Visualization Tools Ability for users to mine the data Resources for the user community Literature, Colleagues Start of the history slide Number of users Organizations Countries represented Awstats or Google History Overview yr started comparative maps expanded to seq (sanger) -> next gen -> association (environment)

TreeGenes Database: Users treegenesdb.org 2,086 users from 862 organizations in 94 countries ***new chart for # unique visitors 10,000 Unique Web Visitors to TreeGenes Database per month, January-December 2015

TreeGenes Database: Species treegenesdb.org 1,774 species from 101 genera At least one genetic artifact from each species Conifers but is also inclusive of all forest trees Full genome sequence: 13 species Transcriptome/Expression resources: 3,920,817 sequences from 263 species 106 genetic maps from 35 species ** update #s here

TreeGenes Database: Data Sources treegenesdb.org Automated scripts Primary databases such as NCBI Appropriate data should be submitted to primary databases first User submissions For data and metadata not captured well by primary databases We don’t hold RAW data (but link to it), we hold EST etc Where do we get our data? NCBI resources User Submissions Including Journal Accession Number Examples of studies – Genotype, Phenotype, and Environment Plant Ontology (standardizing our data) What data we have & where it comes from Overview automated searchig & manual user submission

TreeGenes Database: Data Sources treegenesdb.org Automated scripts NCBI Transcripts, Protein, Unigene Databases Linked to literature records, etc. Incorporated into visualization tools Literature Web of Science, PubMed # papers Where do we get our data? NCBI resources TSA= transcriptome shotgun assembly User Submissions Including Journal Accession Number Examples of studies – Genotype, Phenotype, and Environment Plant Ontology (standardizing our data) What data we have & where it comes from Automated NCBI protein, tsa, est, unigene... Literature? WoS, pubmed

TreeGenes Database: Data Sources treegenesdb.org User submissions Internal projects or collaborations (day one) Submissions of data post-analysis at publication time Where do we get our data? NCBI resources TSA= transcriptome shotgun assembly User Submissions Including Journal Accession Number Examples of studies – Genotype, Phenotype, and Environment Plant Ontology (standardizing our data) What data we have & where it comes from Automated NCBI protein, tsa, est, unigene... Literature? WoS, pubmed

TreeGenes Database: Data Sources treegenesdb.org User submissions Submit genetic maps, association or population study data Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value Most submissions from journal requirement: Tree Genetics and Genomes

TreeGenes Database: Data Sources treegenesdb.org User submissions Genetic maps, association or population studies Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value

TreeGenes Database: Data Sources treegenesdb.org User submissions Obtain TGDR accession number! Genetic maps, association or population studies Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value

TreeGenes Database: Data Sources treegenesdb.org User submissions Genetic maps, association or population studies Will be converted to Tripal module and made available to the community Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value

TreeGenes Database: Data Access treegenesdb.org Tools Existing viewers Custom development Custom Interface Development: Combination of Existing viewers and custom development: GMOD project – CMAP interface (1) Gbrowse/Web Apollo (1) DiversiTree (2 slides) Genetic Stock Center (2 slides) (AdapTree and Collaborative) -> Genotype and Phenotype We contribute development towards association studies, particularly including environmental information

TreeGenes Database: Interfaces treegenesdb.org CMAP interface (1) : browser-based tool for comparing maps (genetic, sequence, etc) Search by species, publication set, feature (name/ID), feature type View map: individual features w/ locations, list of feature types

TreeGenes Database: Interfaces treegenesdb.org Bulk retrieval of resequencing data, genotypes, and phenotypes SSWAP—form of web service—idea that we can push and pull things between analytical tools and visualization tools (over the web instead of over the desktop) Type of web service we developed in collab with iplant Acknowledge damian at end –u of a or semantic options

TreeGenes Database: Interfaces treegenesdb.org Much of our data that comes in is geo referenced.. Talk about this in user sub slide Custom Interface Development: History of CartograTree – Tree Biology Working Group through iPlant CartograTree (3-4) Providing context to geo-referenced data Data from TreeGenes, WorldClim, Ameriflux, TRY-db

TreeGenes Database: Transition to Tripal treegenesdb.org Transition to Tripal and Chado Challenges Ontologies Standardizing for the first time 25 years of custom schemas! Expand on this?

TreeGenes Database: Transition to Tripal treegenesdb.org Transition to Tripal and Chado Advantages Ontologies Save time! Connect data with other databases (Tripal Gateway) Improve analytical capabilities (Tripal Gateway) Our contribution: analytical workflows Review & clean up database Expand on this?

TreeGenes Database: Transition to Tripal Outcomes treegenesdb.org Outcomes Expanded datasets: access Hardwood Genomics, GDR, TreeGenes from one location Expanded analytical, HPC resources Improve CartograTree capabilities New Tripal module: user submission Expand on this?

TreeGenes Database: Team treegenesdb.org tg-help@ucdavis.edu Project Leads Jill Wegrzyn University of Connecticut David Neale Development Team Steven A Demurjian Jr P0383 Hans Vasquez-Gross Lead Database Administrator Emily Grau P0322 Advising Damian Gessler Semantic Options Emphasize our role in DIBBs—association genetics workflow Along with transitioning underlying schema, we are developing association genetic workflow—work into ctree, expland HPC Mention integrating DiversiTree Start in Galaxy If using a galaxy hosted instance through us, it will be kind of like diversitree—but expanded!! @TreeGenes TreeGenes Database