A Guide to Shift’s Open Data ecosystem & Data workflow

Slides:



Advertisements
Similar presentations
Technical BI Project Lifecycle
Advertisements

15 Chapter 15 Web Database Development Database Systems: Design, Implementation, and Management, Fifth Edition, Rob and Coronel.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
Trimble Connected Community
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
Managing Enterprise GIS Geodatabases
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Oct HPS Collaboration Meeting Jeremy McCormick (SLAC) HPS Web 2.0 OR Web Apps and Databases (Oh My!) Jeremy McCormick (SLAC)
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
ETRIKS Platform for bioinformatics ISGC 17/03/15 Pengfei Liu, CC-IN2P3/CNRS.
GIS IN THE CLOUD Cloud computing furnishes scalable GIS technology that is maintained off premises and delivered on demand as services via the Internet.
REDCap General Overview
Organizational IT Stack
Web GIS: Architectural Patterns and Practices
Run Azure Services in your datacenter
Nithyamoorthy S Core Mind Technologies
TOPdesk Service Management Software on Azure
What is it ? …all via a single, proven Platform-as-a-Service.
Connected Living Connected Living What to look for Architecture
The effort-saving, cost-cutting, low-overhead, cloud capture platform.
. . . ? ? ? ? ? ? ETL Engine Server Gateway Server Database Server
Microsoft Azure-Powered BlueCielo Meridian360 Portal Improves Asset Data Integrity and Facilitates Secure Collaboration with External Stakeholders MICROSOFT.
Using E-Business Suite Attachments
Partner Logo Veropath Offers a Next-Gen Expense Management SaaS Technology Solution, Built Specifically to Harness Big Data Analytics Capabilities in Azure.
Overview of MDM Site Hub
PL2759 Autodesk® PLM 360 Connect Integration with Autodesk PLM 360
VI-SEEM Data Discovery Service
NeoFirma Taps into the Microsoft Azure Cloud Platform to Deliver Digital Oilfield SaaS to North American Independent Oil and Gas Producers MICROSOFT AZURE.
Connected Living Connected Living What to look for Architecture
Tools and Services Workshop Overview of Atmosphere
Steering Group Member, Link Digital
Microsoft Ignite /31/ :08 AM
Power BI Security Best Practices
SQOOP.
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
IP Publishing From IP Data Base to IP list to IP catalog
SNOW ONLINE TRAINING IN HYDERABAD
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
FHIR BULK DATA API April 2018
CAMMS Webinar cammsinsights - Friday 25th of May, 2018
Continuous Automated Chatbot Testing
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Accelerate Your Self-Service Data Analytics
HC Hyper-V Module GUI Portal VPS Templates Web Console
Collaborative Business Solutions
Media365 Portal by Ctrl365 is Powered by Azure and Enables Easy and Seamless Dissemination of Video for Enhanced B2C and B2B Communication MICROSOFT AZURE.
Microsoft Azure, RightsWATCH Help Users Keep Sensitive Information Safe from Security Breaches Resulting from Accidental or Malicious Disclosure MICROSOFT.
XtremeData on the Microsoft Azure Cloud Platform:
Analytics Plus Product Overview 1.
Abiquo’s Hybrid Cloud Management Solution Helps Enterprises Maximise the Full Potential of the Microsoft Azure Platform MICROSOFT AZURE ISV PROFILE: ABIQUO.
Enterprise Program Management Office
LitwareHR v2: an S+S reference application
Serverless Architecture in the Cloud
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Chapter 7 Using SQL in Applications
Building Serverless Enterprise Applications
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
ArcGIS Online – The Road Ahead
ITAS Risk Reporting Integration to an ERP
Applying Data Warehouse Techniques
ArcGIS Online Steps for Success A best practices approach
Open Systems Technologies Data Analyst Internship:
Data Wrangling for ETL enthusiasts
What is UiPATH? For more details visit this link online-training.
Presentation transcript:

A Guide to Shift’s Open Data ecosystem & Data workflow --0:10-- My showcase topic is … NNIP Idea Showcase - October 17, 2018

Data Challenges Open Data Ecosystem Data Workflow 1. 2. 3. --0:22-- FRAMING 1. Data Challenges 2. Open Data Ecosystem 3. Data Workflow --0:22-- I will… Start from the challenges we face Open Data Ecosystem as the Solution Last specifically the Data Workflow for ETL process

Core Data Challenges Data Share and Collaboration Secure Data Storage exchanging data of flat file is not efficient or trackable shapefile is more than 1 file support the backend of a public facing application Secure Data Storage data providers have specific usage restrictions exposing sensitive data accidentally ETL Process cleaning data from American Fact Finder repetitive wrangling process for data from same source --1:22--

open data ecosystem DATA STORAGE DATABASE FOR SHIFT’S OPEN DATA TOOLS shift research lab --1:47-- In order to address those challenges, Shift Research Lab built a robust Could Infrastructure hosted by Amazon Web Services. The Open Data Ecosystem serves three main functions: secure storage for our data holdings; a centralized database holding curated data to feed our open data tools; Provide portal for partners to access our data resources DATA STORAGE DATABASE FOR SHIFT’S OPEN DATA TOOLS PARTNER PORTAL

Open Data Ecosystem: Partner Portal --2:47-- This page shows finished architecture of the Open Data Ecosystem. To optimize the way we share data, we built partner portals (shown as blue arrow) that allows users to access or collaborate. 1. For closely collaborative research partners, we provide Authentication Gateway for users to connect to a Virtual Desktop Within the WorkSpace, users can create customizable personal workspaces, where they can use software products––like Tableau, QGIS and RStudio––to analyze and visualize our data. 2. Data Workflow, allows user to contribute to the inventory Currently the Workflow sits in the Private Cloud Environment, but not flexible on accessibility for external users We are building an enhanced Data Workflow User Interface to enable multi-user use case 3. Also we are building Restful API service, for technical users, they can send a request to backend, then the system returns structured data, this process can be streamlined for supporting third party Applications or data visualization dashboard Data Workflow Authentication Gateway

Open Data Ecosystem: Storage --3:07-- Since ……… We deployed a set of infrastructure including: A virtual private cloud encompasses all data holdings, Inside, there are 3 data related resources deployed: Shift curates and integrates extensive data from national as well as locally-sourced data, potentially containing sensitive information. Infrastructure: Virtual Private Cloud 3 data related utilities: Simple Storage Service: support data workflow Relational Database Service: centralized database WorkSpaces: Virtual Desktop providing research environment S3

Open Data Ecosystem: Database PostgreSQL Database Open Source GIS Analysis Access Control by: database: connect, create schema: use, create table: select, update, insert, delete… --3:37-- PostgreSQL is a popular open source database engine type, support GIS analysis, We use this database for research internally, also it’s used to feed public facing applications We set up nested database structure to categorize dataset based on our organizational needs Each object database, schema, table Can be assigned privileges for certain roles individually

Data Workflow: Extract Transform Load Extract Data From… census API .csv format for tabular data .geojson format for geometry data Data Workflow: Extract Transform Load --4:07-- For populating the database, we built a Data Workflow to streamline the ETL process … Transform capture machine metadata: timestamps Enforce field type, field name as specified in config file HOW TO ACCESS THIS TOOLS? Command Line Utility Partners Granted Access through Keys Load into Database database and schema as specified in config file 8

Command Line Interface: Data Workflow: Tech Approach --4:55-- Currently this Workflow is triggered from Github for Census Dataset, and drag and drop to S3 bucket for .csv + geometry, ,which is not flexible We are trying to build a User Interface as an Entry Point, the initial approach is Command Line Interface We vison our data partners using it to pump in a raw file, with the choice of selecting a transforming prototype and destination database as you can see… Module in the middle performs ETL process, S3 and Lambda is a popular stack for building serverless micro service HOW TO ACCESS THIS TOOLS? S3 Command Line Utility Partners Granted Access through Keys Serverless stack 9

stay in touch Nikki Zhu Jennifer Newomer follow us nzhu@garycommunity.org 303-454-3755 Jennifer Newomer jnewcomer@garycommunity.org 303-454-3776 follow us for more information visit SHIFTRESEARCHLAB.ORG