Download presentation
Presentation is loading. Please wait.
Published byFabienne Pagé Modified over 5 years ago
1
Bridge the Gap Between Statistician and Data Analysis Professionals
Ming Amazon
2
Education Career Title Tasks and Requirements
Business insight, dashboard, weekly report, ad-hoc analysis: intermediate SQL, simple statistical models, few machine learning, expert in dashboard A good balance of business analyst and data engineer: strong SQL and dashboard, simple to intermediate statistical model, maybe some machine learning models Build data pipeline in the production environment, strong SQL, strong knowledge in production system and database, no modeling requirement Similar to BIE, but with stronger modeling in statistics and machine learning, usually leverage available methods to solve business problems Very strong modeling background in statistical and machine learning, not much requirement in production level programming More specialized modeling skills (such as NLP, image recognition) and/or with production level programming proficiency Business Analyst Intelligence Engineer Data Scientist Research Applied BS in Statistics MS in PhD in With the difference between statistician and data scientist, what types of data science career tracks maybe available? There are quite a few different career paths for statistician with different background. For large tech companies, the career tracks become more specific recently, and I will call all of them broadly as data science professions. There are three dimensions of expertise: (1) business knowledge, (2) modeling knowledge with R/python, (3) production system knowledge (i.e. Java and SQL). Each career path has its own promotion cycle.
3
Education Career Title Skill Gaps
Business Analyst Intelligence Engineer Data Scientist Research Applied SQL, Dashboard (such as Tableau) Business problem understanding BS in Statistics MS in PhD in SQL, Dashboard, Production environment SQL, Production environment, Big data plat form, Database architecture SQL, Big data platform, Python, Dashboard SQL, Big data platform, Python, Deep learning Programming, SQL, Big data platform, Python, Deep learning
4
Generalist Specialist Business problem formulation skills
Data preprocessing skills Statistical and machine learning methods Deep learning applications knowledge Model building process experience Generalist Natural language understanding Image and video analysis Voice recognition Language translation AI chat-bot Specialist Recent trends of Cloud-Based Solution lower the barrier for statistician drastically in this area If you are good at all three areas, you become a "full-stack" data scientist! The best of the best! Big data infrastructure (Spark) Database and data retrieval (SQL) Production environment Programming skill (Python) Unstructured data (image, voice, text…) Leadership Business understanding Communication skills Collaboration &Teamwork Strategic planning & execution
5
Relatively focus on modeling (i.e. science)
Bring data to model Usually isolated from production system Data is relatively small in size and clean in text file formats Usually structured data Statis-tician Mainly focus on business problem & result (i.e. engineering) Bring models to data Usually embedded in production system Need to work with messy and large amount data in various formats Both structured & unstructured data Data Scientist With the analogy of science and engineering, we can see some of the major differences between statistician and data scientist. First, statistician are relatively focus on modeling, while data scientist are mainly focus on business problem and results. (2) Statistician usually bring data to model (i.e. there is a good model, let us find some data and application); while data scientist usually bring models to data (i.e. we have a business problem, let’s try a few models to see which one is going to help). (3) Statistician usually is isolated from the production system (i.e. the entire software structure) and focus on the structured data given in text format; while data scientist has to be part of production system to leverage the data stored in various format including both structured and un-structured data. (4) Statistician tends to work with small data (i.e. can be fit into laptop’s memory), while data scientist has to deal with large amount of data far beyond a laptop’s hard disk.
6
Business problem and value Resources, milestone and timeline
Data Science Project Cycle Data Information Knowledge Insight Decision & Action Business problem and value Resources, milestone and timeline
7
Project Cycle Business problem definition and understanding
Quantifying business value and define key metrics Computation resources assessment Key milestones and timeline Data security, privacy and legal review Data science formulation Data quality and availability Data preprocessing Feature engineering Model exploring and development Model training, validation, testing Model selection A/B testing in production system Model deployment in production environment Exception management Performance monitoring Model tuning and re-training Model update and add-on Model failure and retirement Project Cycle As you can see, the data science project cycle is complicated. The end to end cycle contains many steps, modeling is just a small part of the entire cycle. For people in any one of the data science professionals, certain level of knowledge or experience to the end to end cycle will greatly help their career development. Of course, for a specific career path, it has its own areas of expertise to be mastered.
8
Agile-Style Project Management
Cross Team Collaboration Business teams Operation team Business analyst team Insight and reporting team Technology team Database and data warehouse team Data engineering team Infrastructure team Core machine learning team Software development team Visualization dashboard team Production implementation Project management team Program management team Product management team Senior leadership team Leaders across organizations Agile-Style Project Management During the end to end data science project cycle, a data science professional need to interact with different teams. First to define the data science project, business teams get involved, such as operation team, and its own analyst team. To interact and establish pipeline with the archived historical data and available real time online data, database, data warehouse and data engineering team. To ensure scalability, the infrastructure team and core machine learning team are needed. Finally to implement the model in production, software development and visualization dashboard team are needed. For a specific project, there is a project manager and one project maybe part of a larger program. And once it becomes a product, it has to be managed by a the product management team. And during the project, occasionally, there will be interaction with senior leadership and leaders across organizations. Data scientists are definitely work in a collaboration environments.
9
Speaking the Same Language
Communication: Speaking the Same Language Interact with multiple teams across the entire project cycle Easy to understand language that everyone understand Be clean on deliverables, timeline and resource allocation Technical modeling part requires communication skills too Statistician, Operation Researcher, Economist, Computer Scientist, Market Researcher, … Need to be familiar with different terminology, for example: Label = Target = Outcome = Class = Response = Dependent Variables Features = Attribute = Independent Variables = Predictors = Covariates Dimensionality = number of features Weights = Parameters Learning = Fitting Generalization = Applying to population or test data Sensitivity = recall = hit rate = true positive rate
10
Summary
11
Pretty broad career choices for data analyst professionals
Gaps between these potential career paths and statistician are: SQL Dashboard (such as Tableau) Big data platform Production environment Python Deep learning Programming (i.e. writing production level code using Java) Project cycles are complicated Communication skills are important to be successful
12
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.