Bridge the Gap Between Statistician and Data Analysis Professionals

Slides:



Advertisements
Similar presentations
E-Science Data Information and Knowledge Transformation Thoughts on Education and Training for E-Science Based on edikt project experience Dr. Denise Ecklund.
Advertisements

1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Data Analytics Program at Drake Brad C. Meyer, Chair Information Management and Business Analytics.
Our Commitment to Your Success: Global eTraining.
Click to edit Master title style June Custom tools for complex markets Custom software and research to support market validation for the life sciences.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Lori Smith Vice President Business Intelligence Universal Technical Institute Chosen by Industry. Ready to Work.™
Database Management Exploring the Territory. Database vs Flat Files Flat Files –Characters-fields-records-files Files are not designed to work together.
1 Technical & Business Writing (ENG-315) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS370 Spring 2007 CS 370 Database Systems Lecture 1 Overview of Database Systems.
CS CS 5150 Software Engineering Lecture 26 People 2.
Chapter 9 The People in Information Systems. Learning Objectives Upon successful completion of this chapter, you will be able to: Describe each of the.
Information Eastman. Business Process Skills Order to Cash, Forecasting & Budgeting, etc. Process Modeling Project Management Technical Skills.
1 Seattle University Master’s of Science in Business Analytics Key skills, learning outcomes, and a sample of jobs to apply for, or aim to qualify for,
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Wake Technical Community College “Wake Tech” Largest community college in NC 70,000+ students a year attending.
What we mean by Big Data and Advanced Analytics
What Business Analytics Can Do For You!
NASA Model-Based Systems Engineering Pathfinder 2016 Summary and Path Forward Karen J. Weiland, Ph.D. Jon Holladay, NASA Systems Engineering Technical.
Malektron: Company Profile
Computer Information Technology
Software Engineering “Practical Approach”
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Systems Analysis and Design in a Changing World, Fifth Edition
Job Titles Examples Used for HISD Nonexempt Jobs
The science of MOOCs.
DevOps Cloud Native Microservices
Telling Stories with Data
Big Data is a Big Deal!.
Big Data Enterprise Patterns
Chapter 1 The Systems Development Environment
Chapter 24: Architecture Competence
Done By: Ashlee Lizarraga Ricky Usher Jacinto Roches Eli Gomez
Big Data A Quick Review on Analytical Tools
Zhangxi Lin Texas Tech University
Bridging the Data Science and SQL Divide for Practitioners
Lecture 17 ATAM Team Expertise
Big-Data Fundamentals
Systems Analysis – ITEC 3155 Evaluating Alternatives for Requirements, Environment, and Implementation.
Chapter 1 The Systems Development Environment
Chapter 1 The Systems Development Environment
Python Classes in Pune |
Information Systems in Organizations 1.1 Introduction to MIS
Messaging: A New Approach for Executive Conversations:
Experiences with Business Analytics Curriculum Implementation
Intro to Machine Learning
Chapter 1 Database Systems
Leigh Grundhoefer Indiana University
OMIS 665, Big Data Analytics
Speech Capture, Transcription and Analysis App
Module 01 ETICS Overview ETICS Online Tutorials
 Deep Analytical Talent  Data Savvy Professionals  Technology and Data Enablers.
Enterprise Program Management Office
Copyright © JanBask Training. All rights reserved Become AWS Certified & Get Amazing Job Opportunities.
CS1301 – Where it Fits Institute for Personal Robots in Education
KNOWLEDGE MANAGEMENT (KM) Session # 35
Intro to Machine Learning
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Chapter 1 Database Systems
People Lead: This is the visual representation of our model. This model supports and reinforces our definition of leadership - achieving results, with.
TECHNOLOGY, ENGINEERING AND DATA CONTINUING AND PROFESSIONAL EDUCATION
Chapter 1 The Systems Development Environment
The Student’s Guide to Apache Spark
KEY INITIATIVE Financial Data and Analytics
Data Science Infrastructure as Code
OU BATTLECARD: Oracle SOA Training & Certification
OU BATTLECARD: Oracle Database 12c
Architecture of modern data warehouse
OU BATTLECARD: Oracle Database 12c
UB Alum Presentation Will Haller Fall 2019.
Presentation transcript:

Bridge the Gap Between Statistician and Data Analysis Professionals Ming Li @ Amazon

Education Career Title Tasks and Requirements Business insight, dashboard, weekly report, ad-hoc analysis: intermediate SQL, simple statistical models, few machine learning, expert in dashboard A good balance of business analyst and data engineer: strong SQL and dashboard, simple to intermediate statistical model, maybe some machine learning models Build data pipeline in the production environment, strong SQL, strong knowledge in production system and database, no modeling requirement Similar to BIE, but with stronger modeling in statistics and machine learning, usually leverage available methods to solve business problems Very strong modeling background in statistical and machine learning, not much requirement in production level programming More specialized modeling skills (such as NLP, image recognition) and/or with production level programming proficiency Business Analyst Intelligence Engineer Data Scientist Research Applied BS in Statistics MS in PhD in With the difference between statistician and data scientist, what types of data science career tracks maybe available? There are quite a few different career paths for statistician with different background. For large tech companies, the career tracks become more specific recently, and I will call all of them broadly as data science professions. There are three dimensions of expertise: (1) business knowledge, (2) modeling knowledge with R/python, (3) production system knowledge (i.e. Java and SQL). Each career path has its own promotion cycle.

Education Career Title Skill Gaps Business Analyst Intelligence Engineer Data Scientist Research Applied SQL, Dashboard (such as Tableau) Business problem understanding BS in Statistics MS in PhD in SQL, Dashboard, Production environment SQL, Production environment, Big data plat form, Database architecture SQL, Big data platform, Python, Dashboard SQL, Big data platform, Python, Deep learning Programming, SQL, Big data platform, Python, Deep learning

Generalist Specialist Business problem formulation skills Data preprocessing skills Statistical and machine learning methods Deep learning applications knowledge Model building process experience Generalist Natural language understanding Image and video analysis Voice recognition Language translation AI chat-bot Specialist Recent trends of Cloud-Based Solution lower the barrier for statistician drastically in this area If you are good at all three areas, you become a "full-stack" data scientist! The best of the best! Big data infrastructure (Spark) Database and data retrieval (SQL) Production environment Programming skill (Python) Unstructured data (image, voice, text…) Leadership Business understanding Communication skills Collaboration &Teamwork Strategic planning & execution

Relatively focus on modeling (i.e. science) Bring data to model Usually isolated from production system Data is relatively small in size and clean in text file formats Usually structured data Statis-tician Mainly focus on business problem & result (i.e. engineering) Bring models to data Usually embedded in production system Need to work with messy and large amount data in various formats Both structured & unstructured data Data Scientist With the analogy of science and engineering, we can see some of the major differences between statistician and data scientist. First, statistician are relatively focus on modeling, while data scientist are mainly focus on business problem and results. (2) Statistician usually bring data to model (i.e. there is a good model, let us find some data and application); while data scientist usually bring models to data (i.e. we have a business problem, let’s try a few models to see which one is going to help). (3) Statistician usually is isolated from the production system (i.e. the entire software structure) and focus on the structured data given in text format; while data scientist has to be part of production system to leverage the data stored in various format including both structured and un-structured data. (4) Statistician tends to work with small data (i.e. can be fit into laptop’s memory), while data scientist has to deal with large amount of data far beyond a laptop’s hard disk.

Business problem and value Resources, milestone and timeline Data Science Project Cycle Data Information Knowledge Insight Decision & Action Business problem and value Resources, milestone and timeline

Project Cycle Business problem definition and understanding Quantifying business value and define key metrics Computation resources assessment Key milestones and timeline Data security, privacy and legal review Data science formulation Data quality and availability Data preprocessing Feature engineering Model exploring and development Model training, validation, testing Model selection A/B testing in production system Model deployment in production environment Exception management Performance monitoring Model tuning and re-training Model update and add-on Model failure and retirement Project Cycle As you can see, the data science project cycle is complicated. The end to end cycle contains many steps, modeling is just a small part of the entire cycle. For people in any one of the data science professionals, certain level of knowledge or experience to the end to end cycle will greatly help their career development. Of course, for a specific career path, it has its own areas of expertise to be mastered.

Agile-Style Project Management Cross Team Collaboration Business teams Operation team Business analyst team Insight and reporting team Technology team Database and data warehouse team Data engineering team Infrastructure team Core machine learning team Software development team Visualization dashboard team Production implementation Project management team Program management team Product management team Senior leadership team Leaders across organizations Agile-Style Project Management During the end to end data science project cycle, a data science professional need to interact with different teams. First to define the data science project, business teams get involved, such as operation team, and its own analyst team. To interact and establish pipeline with the archived historical data and available real time online data, database, data warehouse and data engineering team. To ensure scalability, the infrastructure team and core machine learning team are needed. Finally to implement the model in production, software development and visualization dashboard team are needed. For a specific project, there is a project manager and one project maybe part of a larger program. And once it becomes a product, it has to be managed by a the product management team. And during the project, occasionally, there will be interaction with senior leadership and leaders across organizations. Data scientists are definitely work in a collaboration environments.

Speaking the Same Language Communication: Speaking the Same Language Interact with multiple teams across the entire project cycle Easy to understand language that everyone understand Be clean on deliverables, timeline and resource allocation Technical modeling part requires communication skills too Statistician, Operation Researcher, Economist, Computer Scientist, Market Researcher, … Need to be familiar with different terminology, for example: Label = Target = Outcome = Class = Response = Dependent Variables Features = Attribute = Independent Variables = Predictors = Covariates Dimensionality = number of features Weights = Parameters Learning = Fitting Generalization = Applying to population or test data Sensitivity = recall = hit rate = true positive rate

Summary

Pretty broad career choices for data analyst professionals Gaps between these potential career paths and statistician are: SQL Dashboard (such as Tableau) Big data platform Production environment Python Deep learning Programming (i.e. writing production level code using Java) Project cycles are complicated Communication skills are important to be successful

Thank you!