The role of the National Agricultural Library in arthropod genomics research - implementing and developing tools for genomic data management Monica Poelchau.

Slides:



Advertisements
Similar presentations
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Advertisements

The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
1 Scientific publications: A key factor of the European Research Area Nicole Dewandre Head of Unit “Scientific advice and governance” Directorate “Science.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
The Data Curation Profile IASSIST 2010 Jake Carlson Data Research Scientist Purdue University Libraries.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Franklin Consulting Programme X The Innovation Base The e-Framework: What do they mean for programme management? Tom Franklin Franklin Consulting Richard.
The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Designing the Microbial Research Commons: An International Symposium Overview National Academy of Sciences Washington, DC October 8-9, 2009 Cathy H. Wu.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
13 September 2012 The Libraries’ Role in Research Data Management: A Case Study from the University of Minnesota Meghan Lafferty, Chemistry, Chemical Engineering,
Board on Research Data and Information, National Research Council “Changing Roles of Libraries in Support of Scientific Data Activities” June 3, 2010 More.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Three’s a crowd-source: Observations on Collaborative Genome Annotation. Monica Munoz-Torres, PhD via Suzanna Lewis Biocurator & Bioinformatics Analyst.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
DAN LAWSON BRC 2011 – ANNUAL MEETING UT SOUTHWESTERN MEDICAL CENTER DALLAS, TX SEPTEMBER 2011 Challenges and opportunities of new sequencing technologies.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Michael Witt Interdisciplinary Research Librarian & Assistant Professor Purdue Libraries & Distributed Data Curation Center (D2C2) Eliciting.
Organizing information in the post-genomic era The rise of bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Data Practices across Disciplines: Informing Collections & Curation Carole L. Palmer Melissa H. Cragin, Tiffany Chao, & Nic Weber Center for Informatics.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
IPlant Genomics in Education
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Accessing and visualizing genomics data
A Tripal based Arthropod genome portal The i5k A Tripal based Arthropod genome portal Christopher Childers USDA/ARS/NAL i5k.nal.usda.gov.
Preserving Digital Publications Evelyn Frangakis Preservation Officer National Agricultural Library CENDI/FLICC OAIS Symposium December 11, 2001.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
High throughput biology data management and data intensive computing drivers George Michaels.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Open Source Technologies at the National Agricultural Library Ursula Pieper IT Specialist – Web Team Lead National Agricultural Library Agricultural Research.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
The i5k – enabling genomic data access, visualization and curation for the i5k community Monica Poelchau and the i5k group.
Towards a unified MOD resource: An Overview
Witness Statement – TAIR
ELIXIR Core Data Resources and Deposition Databases
Tools and Services Workshop
TreeGenes & Tripal treegenesdb.org Emily Grau
University of Chicago and ANL
Joslynn Lee – Data Science Educator
CottonGen: An Up-to-Date Resource Enabling Genetics, Genomics and Breeding Research for Crop Improvement Plant and Animal Genome Conference XXV Jing Yu1,
SRA Submission Pipeline
Functional Annotation of the Horse Genome
Ensembl Genome Repository.
Yating Liu July 2018 G-OnRamp workshop
Follow-up from last night: XSEDE credits
Bird of Feather Session
Presentation transcript:

The role of the National Agricultural Library in arthropod genomics research - implementing and developing tools for genomic data management Monica Poelchau and the i5k team

Outline What roadblocks are there for data management of genome sequencing projects? What tools has the NAL developed to help overcome these roadblocks? Why is the NAL uniquely positioned to help with ARS genomic data management? What are our future plans?

Data management of genome sequencing projects. How do you handle your data during research, and after research has been completed? Scale. High-throughput sequencing = vast amounts of sequence data. The sheer scale creates a data management challenge: – Computational infrastructure; – Computational skills/basic UNIX literacy; – Basic domain knowledge in molecular biology and genomic data manipulation.

Data management of genome sequencing projects. How do you handle your data during research, and after research has been completed? Reusability. Data re-use: genome sequencing projects often have the goal of being reference datasets, to be used across the scientific community 1. Experience. More and more, groups with little experience with genomic data are sequencing genomes and are encountering a steep learning curve. 1 cf. National Research Council. A New Biology for the 21st Century. Washington, DC: The National Academies Press, 2009.

Initial planning Genome sequencing and assembly - usually outsourced to sequencing center Post-processing: Annotation – ascribe function to elements of assembled dataset; Quality assessment of assembly and annotations; Manual curation of annotations; Generate an ‘official gene set’ (OGS). Custom analyses. Deposit OGS at NCBI. Publish results in Nature. Repeat. DNA/RNA sample generation Deposit raw reads and assembly at NCBI Host assembly and annotations in public DB for curation and public access. Genome project workflow

The i5k Our goal: to help any i5k genome with data management during the post-annotation process We enable data access, visualization, and curation for any ‘orphan’ i5k species

Submission ‘Frozen’ genome assembly Automated annotations Ancillary datafiles (e.g. RNA-Seq alignments) Organism page Blast Genome browser Web Apollo Bulk downloads Official gene set v1.0 Tutorials Individual help and consultations Tool development

I5k in numbers 33 species available – all with ‘pre-release’ annotations 230+ registered manual annotators 1/5 of annotators work on multiple species >5,800 gene models manually curated

4 Arachnids 2 Copepods 2 Palaeoptera 4 Hymenoptera 4 Coleoptera 1 Trichoptera 9 Diptera 7 Hemiptera

Why the NAL? Our mission. The NAL strives to collect, manage, disseminate, and preserve needed agricultural information for research scientists and the general public. Permanence. NAL has the capability of operating beyond regular grant cycles. Staff. Bioinformatics specialists, Information scientists, interns, web developers, IT staff, metadata librarian.

Initial planning Genome sequencing and assembly - usually outsourced to sequencing center Post-processing: Annotation – ascribe function to elements of assembled dataset; Quality assessment of assembly and annotations; Manual curation of annotations; Generate an ‘official gene set’ (OGS). Custom analyses. Deposit OGS at NCBI. Publish results in peer-reviewed journals. Repeat. DNA/RNA sample generation Deposit raw reads and assembly at NCBI Host assembly and annotations in public DB for curation and public access. Genome project workflow

A potential model for a future I5k Workspace i5k NAL Genome assembly NCBI iPlant/ARS HPC Specialized analyses Data archival I5k Workspace functions: 1)Visualize and access genomic data; 2)Easily access other i5k genomes; 3)Improve and update gene predictions dynamically; 4)Develop an ‘OGS’ for each community; 5)Quickly proceed to analyzing data; 6)Easier submission to NCBI. Quality assessment

The role of the NAL in AGR – we need your feedback! Contact us! Tell us your opinion in concurrent sessions 2a and 2b (Wednesday and Thursday). View our web resources: Take our survey:

Credits and acknowledgements The NAL team: – Chris Childers – Vijaya Tsavatapalli – Gary Moore – Susan McCarthy – Chien-Yueh Lee, Han Lin, Jun- Wei Lin I5k advisory committee: – Jay Evans – Don Gourley – Kevin Hackett – Simon Liu – Ursula Pieper The i5k coordinating committee The i5k pilot project (in particular Stephen Richards and Dan Hughes) Wayne Hunter iPlant All of our users and data contributors! CONTACT US: