Presentation is loading. Please wait.

Presentation is loading. Please wait.

TreeGenes & Tripal treegenesdb.org Emily Grau

Similar presentations


Presentation on theme: "TreeGenes & Tripal treegenesdb.org Emily Grau"— Presentation transcript:

1 TreeGenes & Tripal treegenesdb.org Emily Grau
-Total time alloted: 15 minutes (10-12 minutes talking time? Also the last presentation, so time might be squeezed from other talks) -How much background detail to give? Outline What TG is: History Userbase? Data we hold: # of species—many! Data types Data sources: Primary databases Users! Tools we offer GMOD Custom development: DiversiTree and CartograTree Transition to Chado & Tripal: Challenges: 25 years of data in custom schema  chado Advantages: Future stuff: Tripal Gateway will allow us to expand resources we offer to users! Connect HWG, GDR, TG Workflows + HPC resources (will we tie this in to CartograTree, give users greater control, offer multistep workflows?) New user submission module (!!!!!), plan to develop module for others to use(?) treegenesdb.org Emily Grau Department of Ecology & Evolutionary Biology University of Connecticut, Storrs CT

2 TreeGenes Database: History
treegenesdb.org Began as the Dendrome project (USDA funded initiative) in 1993 to hold forest tree genetic maps and associated markers One of the 1st UDSA funded databases on the internet Schema has been changing and evolving ever since Start of the history slide Number of users Organizations Countries represented Awstats or Google History Overview yr started comparative maps expanded to seq (sanger) -> next gen -> association (environment)

3 TreeGenes Database: History
treegenesdb.org Began to hold forest tree genetic maps and associated markers Expanded to other data types Sequence Reseqeuncing, Large-Scale Genotyping, Transcriptomics/Expression Full Genome Sequences Analysis and Visualization Tools Ability for users to mine the data Resources for the user community Literature, Colleagues Start of the history slide Number of users Organizations Countries represented Awstats or Google History Overview yr started comparative maps expanded to seq (sanger) -> next gen -> association (environment)

4 TreeGenes Database: Users
treegenesdb.org 2,086 users from 862 organizations in 94 countries ***new chart for # unique visitors 10,000 Unique Web Visitors to TreeGenes Database per month, January-December 2015

5 TreeGenes Database: Species
treegenesdb.org 1,774 species from 101 genera At least one genetic artifact from each species Conifers but is also inclusive of all forest trees Full genome sequence: 13 species Transcriptome/Expression resources: 3,920,817 sequences from 263 species 106 genetic maps from 35 species ** update #s here

6 TreeGenes Database: Data Sources
treegenesdb.org Automated scripts Primary databases such as NCBI Appropriate data should be submitted to primary databases first User submissions For data and metadata not captured well by primary databases We don’t hold RAW data (but link to it), we hold EST etc Where do we get our data? NCBI resources User Submissions Including Journal Accession Number Examples of studies – Genotype, Phenotype, and Environment Plant Ontology (standardizing our data) What data we have & where it comes from Overview automated searchig & manual user submission

7 TreeGenes Database: Data Sources
treegenesdb.org Automated scripts NCBI Transcripts, Protein, Unigene Databases Linked to literature records, etc. Incorporated into visualization tools Literature Web of Science, PubMed # papers Where do we get our data? NCBI resources TSA= transcriptome shotgun assembly User Submissions Including Journal Accession Number Examples of studies – Genotype, Phenotype, and Environment Plant Ontology (standardizing our data) What data we have & where it comes from Automated NCBI protein, tsa, est, unigene... Literature? WoS, pubmed

8 TreeGenes Database: Data Sources
treegenesdb.org User submissions Internal projects or collaborations (day one) Submissions of data post-analysis at publication time Where do we get our data? NCBI resources TSA= transcriptome shotgun assembly User Submissions Including Journal Accession Number Examples of studies – Genotype, Phenotype, and Environment Plant Ontology (standardizing our data) What data we have & where it comes from Automated NCBI protein, tsa, est, unigene... Literature? WoS, pubmed

9 TreeGenes Database: Data Sources
treegenesdb.org User submissions Submit genetic maps, association or population study data Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value Most submissions from journal requirement: Tree Genetics and Genomes

10 TreeGenes Database: Data Sources
treegenesdb.org User submissions Genetic maps, association or population studies Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value

11 TreeGenes Database: Data Sources
treegenesdb.org User submissions Obtain TGDR accession number! Genetic maps, association or population studies Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value

12 TreeGenes Database: Data Sources
treegenesdb.org User submissions Genetic maps, association or population studies Will be converted to Tripal module and made available to the community Population genetics including association studies (emph phenotype, genotype, environmental data (emph GPS coordinates)) -This type of information is not captured well by other databases -Collecting ****metadata****—how you did your analysis, what your results are NOT like dryad—we connect the information, giving it value

13 TreeGenes Database: Data Access
treegenesdb.org Tools Existing viewers Custom development Custom Interface Development: Combination of Existing viewers and custom development: GMOD project – CMAP interface (1) Gbrowse/Web Apollo (1) DiversiTree (2 slides) Genetic Stock Center (2 slides) (AdapTree and Collaborative) -> Genotype and Phenotype We contribute development towards association studies, particularly including environmental information

14 TreeGenes Database: Interfaces
treegenesdb.org CMAP interface (1) : browser-based tool for comparing maps (genetic, sequence, etc) Search by species, publication set, feature (name/ID), feature type View map: individual features w/ locations, list of feature types

15 TreeGenes Database: Interfaces
treegenesdb.org Bulk retrieval of resequencing data, genotypes, and phenotypes SSWAP—form of web service—idea that we can push and pull things between analytical tools and visualization tools (over the web instead of over the desktop) Type of web service we developed in collab with iplant Acknowledge damian at end –u of a or semantic options

16 TreeGenes Database: Interfaces
treegenesdb.org Much of our data that comes in is geo referenced.. Talk about this in user sub slide Custom Interface Development: History of CartograTree – Tree Biology Working Group through iPlant CartograTree (3-4) Providing context to geo-referenced data Data from TreeGenes, WorldClim, Ameriflux, TRY-db

17 TreeGenes Database: Transition to Tripal
treegenesdb.org Transition to Tripal and Chado Challenges Ontologies Standardizing for the first time 25 years of custom schemas! Expand on this?

18 TreeGenes Database: Transition to Tripal
treegenesdb.org Transition to Tripal and Chado Advantages Ontologies Save time! Connect data with other databases (Tripal Gateway) Improve analytical capabilities (Tripal Gateway) Our contribution: analytical workflows Review & clean up database Expand on this?

19 TreeGenes Database: Transition to Tripal Outcomes
treegenesdb.org Outcomes Expanded datasets: access Hardwood Genomics, GDR, TreeGenes from one location Expanded analytical, HPC resources Improve CartograTree capabilities New Tripal module: user submission Expand on this?

20 TreeGenes Database: Team
treegenesdb.org Project Leads Jill Wegrzyn University of Connecticut David Neale Development Team Steven A Demurjian Jr P0383 Hans Vasquez-Gross Lead Database Administrator Emily Grau P0322 Advising Damian Gessler Semantic Options Emphasize our role in DIBBs—association genetics workflow Along with transitioning underlying schema, we are developing association genetic workflow—work into ctree, expland HPC Mention integrating DiversiTree Start in Galaxy If using a galaxy hosted instance through us, it will be kind of like diversitree—but expanded!! @TreeGenes TreeGenes Database


Download ppt "TreeGenes & Tripal treegenesdb.org Emily Grau"

Similar presentations


Ads by Google