Open Systems Technologies Data Analyst Internship:

Slides:



Advertisements
Similar presentations
MS Access.
Advertisements

Unveiling ProjectWise V8 XM Edition. ProjectWise V8 XM Edition An integrated system of collaboration servers that enable your AEC project teams, your.
Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
ENTERPRISE FEEDBACK MANAGEMENT
Technical BI Project Lifecycle
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
TotalETL:infoServer Chris Fournier Nathan Clark Scott Longley Cyril Shilnikov MQP Project 2005 Sponsored by TotalETL inc.
Accelerated Access to BW Al Weedman Idea Integration.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
Overview of SQL Server Alka Arora.
SharePoint 2010 Business Intelligence Module 2: Business Intelligence.
Tunis International Centre for Environmental Technologies Small Seminar on Networking Technology Information Centers UNFCCC secretariat offices Bonn, Germany.
HDNUG 27-March-2007 SQL Server 2005 Suite as a Business Intelligence Solution.
Experiment Management System CSE 423 Aaron Kloc Jordan Harstad Robert Sorensen Robert Trevino Nicolas Tjioe Status Report Presentation Industry Mentor:
Joe Caserta President Elliott Cordo Chief Architect September 30, 2015, Javits Center, New York City Building a Data Lake for Digital Music Dominance.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Windows Azure poDRw_Xi3Aw.
Fundamentals of MyBATIS
Big Data Yuan Xue CS 292 Special topics on.
Andy Roberts Data Architect
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Introduction The concept of a web framework originates from the basic idea that every web application obtains its foundations from a similar set of guidelines.
Patrick Desbrow, CIO & VP of Engineering October 29, 2014
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Amazon Web Services RDS with SQL Server
Data Platform and Analytics Foundational Training
Customized cloud platform for computing on your terms !
Integrating QlikView with MPP data sources
CS122B: Projects in Databases and Web Applications Spring 2017
CS122B: Projects in Databases and Web Applications Winter 2017
Gabe Cano, Altarum Institute  Mark Perry, Altarum Institute 
Data Interface Module Leighton Wingerd & Manisha Kollu
Chris Menegay Sr. Consultant TECHSYS Business Solutions
Tools and Services Workshop Overview of Atmosphere
Created by Kamila zhakupova
The Client/Server Database Environment
Extensible Platform Microsoft Dynamics 365
Making your Data Lake smarter with Cognitive Services
Amazon Web Services RDS with SQL Server
AWS DevOps Engineer - Professional dumps.html Exam Code Exam Name.
Where can I download Aws Devops Engineer Professional Exam Study Material - Get Updated Aws Devops Engineer Professional Braindumps Dumps4downlaod.us
2018 Amazon AWS DevOps Engineer Professional Dumps - DumpsProfessor
A Guide to Shift’s Open Data ecosystem & Data workflow
BUREAU VERITAS COMMODITIES
Accelerate Your Self-Service Data Analytics
Current State Problem of Multiple Data Sources
Collaborative Business Solutions
Orchestration and data movement with Azure Data Factory v2
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
XtremeData on the Microsoft Azure Cloud Platform:
AWS Cloud Computing Masaki.
Interpret the execution mode of SQL query in F1 Query paper
Technical Capabilities
Microsoft Flow Approvals 101
AWS Glue - Introduction
Introduction to Dataflows in Power BI
Orchestration and data movement with Azure Data Factory v2
Amazon AWS Certified Solutions Architect Professional solutions-architect-professional-practice-test.html.
MIS2502: Data Analytics MySQL and MySQL Workbench
Business Intelligence
Power BI – Introduction to Dataflows
ITAS Risk Reporting Integration to an ERP
Oracle’s Reporting Strategy
The Student’s Guide to Apache Spark
Data Wrangling as the key to success with Data Lake
REST Easy - Instant APIs for Your Database
Data Wrangling for ETL enthusiasts
Customer 360.
SQL Server 2019 Bringing Apache Spark to SQL Server
Presentation transcript:

Open Systems Technologies Data Analyst Internship: AWS Recommendation System Masters Project Joe McCartney Data Science and Analytics, Grand Valley State University, Allendale, Michigan 49401 Introduction Open Systems Technologies (OST) is an integrated, cross-functional business technology firm bringing together strategy & insights, digital experiences, connected products, data center transformation and enterprise managed services for clients to optimize and grow their businesses. I had the opportunity to be a part of the connected products team as a data analytics intern. The main project I had was designing and creating a data pipeline for a recommendation system using Amazon Web Services (AWS) for Herman Miller. Pipeline Steps 1) AWS RDS An AWS RDS Microsoft SQL instance holds all of the data. The database contains 20 tables and 14 are currently used in modeling. To use the data, routine extracts occur as opposed to having a live connection. 4) AWS SageMaker Using a jupyter notebook, it creates a model that is hosted on an endpoint. Currently it is built using an XGBoost model. Resulting model can easily be referenced to make new predictions. SageMaker simplifies machine learning with being highly customizable and its connections to the rest of the AWS platform 6) Website A user can input what kind of product they are looking for and the main drivers behind their choice. The system then takes those inputs and returns a Herman Miller product based on the model. 2) AWS GLUE The glue step contains 4 different parts: A Glue connection allows for Glue to access the data in RDS. A scheduled Glue Crawler checks for any changes to the databases schema. The Glue Job transfers the data from RDS to S3 as CSV files A scheduled Glue Trigger routinely launches the Glue Job. Need this ETL process to change format and structure of data for appropriate use in SageMaker. Pipeline Overview 1) Data is stored in a RDS MS SQL Database 2) Glue functions transfer data from RDS to S3 3) A S3 bucket holds data from RDS for SageMaker 4) SageMaker uses machine learning to create a model and the result is hosted on an endpoint 5) Lambda and CloudFormation are used to recreate the pipeline in other AWS environments 6) Website uses the endpoint to make recommendations to users Figure 2. An example of the algorithm making a recommendation based on certain values. Figure 4. The website interface that utilizes the endpoint to recommend products to users. What’s Next New machine learning algorithm ETL improvements Retraining the model Scheduling with CloudWatch or Step Functions Allow model to include user feedback CloudFormation improvements 5) AWS Lambda & CloudFormation Generates a majority of the pipeline in any AWS environment. Takes less than a minute to generate all Glue and S3 functions and portions of SageMaker. Code is all dynamic so setting values takes little time. Will allow for quick creation of other recommendation projects 3) AWS S3 Contains a different folder and CSV file for each of the tables from RDS. S3 Bucket is needed because it’s the only way SageMaker will ingest data. S3 bucket acts as intermediate storage for the process. Using a batch process to be able to work with known quantity and not interrupt users. OST AWS HMI AWS Acknowledgments I thank all of the members of the Connected Products team at OST for their help and guidance. Figure 3. CloudFormation allows for the duplication of AWS processes in the Herman Miller AWS environment based on what was made in OST’s Figure 1. The overall pipeline of the AWS project