GIAB: Genome reference material development resources for clinical sequencing Chunlin Xiao 1, Justin Zook 2, Shane Trask 1, Melissa Landrum 1, Marc Salit.

Slides:



Advertisements
Similar presentations
Submitting a Genome to RAST. Uploading Your Job 1.Login to your RAST account. You will need to register if this is your first time using SEED technologies.
Advertisements

IMGS 2012 Bioinformatics Workshop: File Formats for Next Gen Sequence Analysis.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
IHWG Workshop Data Tools for HLA Sequence.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
SMART/FHIR Genomic Resources An overview... For latest see
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
“How Perl Saved the Human Genome Project” DATE: Early February, 1996 LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing.
The Phase 1 Variant Set and Future Developments
NGS Analysis Using Galaxy
The HMP Data Analysis and Coordination Center (DACC) plays the role of collecting, integrating & standardizing different data types from diverse sources.
Summary of FDA and NIST efforts toward metrics and standardization Content by Marc Salit & Justin Zook (NIST) and Liz Mansfield & Zivana Tezak (FDA) As.
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
TCGA The Cancer Genome Atlas Project January 24, 2008.
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
SMART/FHIR Genomic Resources
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
NGS data analysis CCM Seminar series Michael Liang:
-- Don Preuss NCBI/NLM/NIH
NCI Cloud Pilot Collaboration Meeting
A collaborative partnership between the State of Kansas Department of Revenue – Property Valuation Division (KDOR/PVD), the Kansas GIS Policy Board’s Data.
SMART/FHIR Genomic Resources An overview.... Change Log Added a few changes to Sequence resource Added data support for alignment data (e.g. SAM or BAM.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Genome STRiP ASHG Workshop demo materials
Contains details of your submission Manifest file FILE EXTENSION -.manifest.json FORMAT - JSON format REQUIRED - Genboree login name, group name, database.
No reference available
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
Call in: Participant Passcode: Centra: Meeting ID: ICR_WShttp://ncicb.centra.com August 11, 2010 ICR-WS Meeting.
Joint work with many colleagues at FDA precision.fda.gov | Taha A. Kass-Hout, MD, MS FDA Chief Health Informatics.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
CCRC Cancer Conference November 8, 2015.
From Reads to Results Exome-seq analysis at CCBR
Million Veteran Program: Industry Day Genomic Data Processing and Storage Saiju Pyarajan, PhD and Philip Tsao, PhD Million Veteran Program: Industry Day.
Lecture #4 ABPG+BRIM Exome sequencing project Alexei Fedorov
To develop the scientific evidence base that will lessen the burden of cancer in the United States and around the world. NCI Mission Key message:
Data and Hartwig Medical Foundation
Introductory RNA-seq Transcriptome Profiling
NGS File formats Raw data from various vendors => various formats
Canadian Bioinformatics Workshops
Tools For Vertebrate Gene Naming
Cancer Genomics Core Lab
ClinVar A system for maintaining medically relevant variation data
Next Generation Sequencing Analysis
Hub Updates for Year 3 Carl Kesselman.
University of Chicago and ANL
NGS Analysis Using Galaxy
Using ArrayExpress.
How to store and visualize RNA-seq data
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
GE3M25: Data Analysis, Class 4
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
SRA Submission Pipeline
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
2nd (Next) Generation Sequencing
Genomic Formats and the HLA Data Standard
RNA-SEQ IN PPMI Whole-Blood samples
Enrique Garcia-Assad, Indresh Singh, Pratap Venepally, Jason Inman
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
1. C. briggsae sequence curation 2. SNP data handling
TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction.
The NCI Genomic Data Commons as an engine for precision medicine
FaceBase Hub Years 1 through 5
HCA Data Access Oct 3rd 2019.
Presentation transcript:

GIAB: Genome reference material development resources for clinical sequencing Chunlin Xiao 1, Justin Zook 2, Shane Trask 1, Melissa Landrum 1, Marc Salit 2, Stephen Sherry 1, and the Genome-in-a-Bottle Consortium 1 NIH/NLM/NCBI, 45 Center Drive, Bethesda, MD NIST, 100 Bureau Dr, Gaithersburg, MD Data Visualization Current consensus SNP callset for NA12878 generated by NIST can be visualized through NCBI Get-RM browser ( Other variant call sets for the same individual generated by clinical laboratories with various technologies can be uploaded as different tracks for side-by-side comparison. The browser also allows you to upload your own data for display in the Sequence Viewer alongside NCBI-provided tracks. References Justin Zook et al. (2014) Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nature Biotechnology 32, 246–251 The 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 Durbin,R.M. et al. (2010) A map of human genome variation from population-scale sequencing. Nature, 467, 1061– Abstract Reference materials play important roles in validating performance of sequencing platforms and enabling regulations of clinical applications. Genome-in-a-Bottle (GIAB) project is a collaboration between NIST, FDA, NCBI, academic sequencing groups, sequencing technology developers, and clinical laboratories to develop analytical- grade reference genome materials and accompanying performance metrics for the development of regulations and professional standards for clinical sequencing. NCBI is serving as the Data Coordination Center (DCC) and repository for the raw sequencing reads, mapped alignments, genotypes, and other details for each sample on a dedicated FTP site (ftp://ftp-trace.ncbi.nih.gov/giab/ftp). Here we describe the processes of data generations and data submissions, and how the community can access the data. We are also developing a genome browser for data visualization. GIAB consortium plans to release data to the public on a regular basis.ftp://ftp-trace.ncbi.nih.gov/giab/ftp Data Submission and Accessioning NCBI serves as Data Coordination Center (DCC) and repository for the raw sequence, genotypes and other details for each sample from Genome-in-a-Bottle project. Currently the project is focusing on one sample, which is NA12878, daughter of NA12891 and NA We have created a drop-box for each of the submitters, including NIST, COMPLETE, GARVAN, ILLUMINA, INOVA, RTG, NCI, NOVARTIS, so that they can upload their data to NCBI. Collaborator can submit raw sequence reads in fastq format, read alignments in bam format, genotype data in VCF format, or analysis tools to NCBI. Subsequently the submitted data will be accessioned and archived at NCBI. Data Distribution All the submitted GIAB data are made available by DCC to the research community on a dedicated ftp site and aspera server. User can download data, including fastq, bams, vcfs files, via our ftp site (ftp://ftp-trace.ncbi.nih.gov/giab/ftp). The structure of GIAB ftp site is very similar to 1000genomes ftp site. The primary sequence data are organized by sample name (under “/data” directory), while the official genotype data are released under “/release” directory. Intermediate data or method development data are organized under “/technical” directory. For each of the release, we create a sequence.index file to track all the fastq sequences along with the meta information. An alignment.index file is created to include all the alignment bam files that are used for generating variant calls.ftp://ftp-trace.ncbi.nih.gov/giab/ftp To facilitate cloud-based data analysis, the whole GIAB data set has been mirrored to Amazon Cloud. User with AWS cloud accessibility can access the GIAB data through Amazon Simple Storage Service (S3) and the bucket name is s3://giab/. (a) Layout of GIAB data at NCBI ftp site (b) Layout of GAIB data at Amazon S3