HCA Data Access Oct 3rd 2019.

Slides:



Advertisements
Similar presentations
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Exchange Product Overview Secure Transmission for Transaction-based Documents.
Advertisements

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. The Web Services Modeling Toolkit Mick Kerrigan.
Doug Nebert, Senior Advisor for Geospatial Technology, System-of-Systems Architect FGDC Secretariat.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Long-term Archive Service Requirements draft-ietf-ltans-reqs-00.txt.
Doug Nebert Senior Advisor for Geospatial Technology CSS, FGDC Secretariat.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
OMap By: Haitham Khateeb Yamama Dagash Under Suppervision of: Benny Daon.
Application Packaging Standard Fundamentals
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Marketplace & Image Metadata.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
LexBIG Release Overview Aug 21, LexBIG Context Project Goals for Sept –Incremental point release of LexBIG infrastructure to support EVS activities.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
NCI Cloud Pilot Collaboration Meeting
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
Metadata Mòrag Burgon-Lyon University of Glasgow.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Metadata Registries Registry: authoritative, centrally controlled store of information – W3C Web Services Glossary, 2004
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
Google Map Engine Can export images to Map Engine from Earth Engine
Download class materials onto your desktop… as usual.
Oct. 16 th, 2013 Geant4 hadronic Meeting 1 Hans Wenzel Oct 16 th 2013 Status of physics validation tool.
External Interface Update 1 External Interface Update April 2, 2007 Daryl Shing.
ArrayExpress Ugis Sarkans EMBL - EBI
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Data Management Daniel Marcus Washington University.
Introduction To DBMS.
The Client-Server Model
University of Chicago and ANL
Using Azure Functions to Build Nanoservices
aspects of archive system design
SQL Server Reporting Service & Power BI
An Overview of Data-PASS Shared Catalog
StratusLab Tutorial (Bordeaux, France)
Windows 8 Hardware Certification Program and KIT Overview
Framework Curation and Loading for tranSMART v1.0 - FC&L4tranSMART
Tools and Services Workshop Overview of Atmosphere
Steering Group Member, Link Digital
How to store and visualize RNA-seq data
miRPathDB: A Specialized Professional Database with Upkeep Concerns
Tom Rink Tom Whittaker Paolo Antonelli Kevin Baggett.
PROCESS - H2020 Project Work Package WP6 JRA3
SRA Submission Pipeline
MAKE SDTM EASIER START WITH CDASH !
USF Health Informatics Institute (HII)
Testing REST IPA using POSTMAN
A Guide to Shift’s Open Data ecosystem & Data workflow
CDISC SHARE API v1.0 CAC Update 22 February 2018
GIFT / Fiscal Data Package Iteration 3
Module 01 ETICS Overview ETICS Online Tutorials
Capturing and Organizing Scientific Annotations
Publishing data and metdata From iRODS to repositories
Datasets in CRM Site Proposal
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Getting Started With Solr
MIS2502: Data Analytics MySQL and MySQL Workbench
Reportnet 3.0 Database Feasibility Study – Approach
Lab 2: Information Retrieval
Photo Classification Evaluation Tool
PyWBEM Python WBEM Client: Overview #2
Health & Consumers DG SANCO Unit A.4 Information systems
#01# ASP.NET Core Overview Design by: TEDU Trainer: Bach Ngoc Toan
Web Application Development Using PHP
Eurostat Unit B3 – IT and standards for data and metadata exchange
FaceBase Hub Years 1 through 5
SDMX IT Tools SDMX Registry
Presentation transcript:

HCA Data Access Oct 3rd 2019

Overview Overview of the HCA data life cycle Accessing data through the HCA DCP DSS API The python dcp-cli The Bioconductor HCABrowser Package Accessing data through the Digested HCA DCP (Azul) The data explorer UI The Bioconductor HCAExplorer Package Accessing data through the Matrix Service Overview

An overview of the HCA data life cycle Research Data: Labs submit their single cell data and associated metadata. Data Curator: Submitters work with a Data Curator to upload the data and metadata and ensure it is well formatted and conforms to file format standards. Metadata is also validated as conforming to HCA metadata standards; errors are identified and corrected for re-uploading. Storage: Validated raw data and metadata files are submitted to the Data Storage System. Storage is provided in both Amazon Web Services (AWS) and Google Cloud Platform (GCP) environments, and data can be accessed from either Pipelines: Data processing pipelines, approved by the HCA Analysis Working Group, process raw data from some single cell assays, producing matrices and QC metrics files. These pipelines identify genes, quantify transcripts, and assess data quality. Like raw data, processed data are put in the Data Storage System for access by the community. https://staging.data.humancellatlas.org/guides/data-lifecycle#introduction

https://dss.data.humancellatlas.org/ A data storage system designed for hosting and large datasets hosted on Amazon S3 and Google Storage. Provides an API to interact with data: https://dss.data.humancellatlas.org/v1/swagger.json Defined by a schema: https://schema.humancellatlas.org/a Certain activities require authorization. The HCA Data Coordination Platform Data Storage System (HCA DCP DSS) API

What’s on the HCA DCP The DCP contains users submitted data. Four objects to act on The core unit in the DCP is a bundle: Defined with a uuid (string) and a version (string): e.g. ffffba2d-30da-4593-9008-8b3528ee94f1.2019-08-01T200147.309074Z Contains information relevant to a single experiment Metadata Schema data Experimental data (bam, fasta, etc.) Bundles contains files: Defined with a name (string), a uuid (string), and a version(string): e.g. cell_suspension.json e.g. ba96ea2d-c7e2-4c47-9561-418a849f93d0 What’s on the HCA DCP The main unit is a “bundle”. Bundles are what are uploaded and primary downloaded by users.

What’s on the HCA DCP (cont.) Collections are links to files, bundles, and other collections: Contains a CollectionItem identified with: Type (file, bundle, or collection) A uuid A version A description (string) Details of supplementary json information (json) A name identifying the collection (string) Subscriptions support webhook subscriptions for activities like bundle creation, deletion, and updating. What’s on the HCA DCP (cont.)

A python library and command line interface used to interact with the HCA DCP DSS’s API Currently the primary way of interacting with the API The dcp-cli Talk about how these platforms expose the full range of functionality of the HCA, but are difficult to use and requires knowledge of the underlying metadata schema to navigate.

Example

A Bioconductor Package used to interact with HCA DCP DSS’s API Meant to mirror the functionality of the dcp-cli Utilizes `rapiclient` to facilitate access to the API Improvements planned for the Bioconductor 3.10 release HCABrowser

Example

The Azul Backend (digested HCA) A digested version of the HCA Responds to updates in the HCA using subscriptions Simplified API. Allows gleaning helpful information e.g. There are 4 projects where the brain is an organ being studied. The Azul Backend (digested HCA)

The Data Explorer https://data.humancellatlas.org/explore/projects Provides a user friendly web interface: Construct queries Closely examine project info Download data Direct expression matrix download File manifest that can then be fed to the python HCA dcp-cli The Data Explorer

Example

Example

A Bioconductor package used to interact with the Azul Backend Meant to mirror the functionality of the Data Explorer Provides a programmatic and GUI access using Shiny (still planned) Package to be added in the Bioconductor 3.10 Release HCAExplorer

Example

The Matrix Service and the HCAMatrixBrowser The HCAMatrixBrowser is a package meant to interact with the Matrix Service API Marcel? The Matrix Service and the HCAMatrixBrowser

Questions?