Download presentation
Presentation is loading. Please wait.
1
Data Science 101 To Production
11/29/2018 2:52 PM Data Science 101 To Production David Crook @Data4Bots DaCrook.com © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
Announcements Ascend+ Data-FL Slack Channel
3
Goals of Presentation What is Data Science Data Science + App Dev
Building for Ambiguity Azure Architecture Data Visualization Algorithm Development
4
David Crook Microsoft Developer Evangelist Focused on Data Science
Previously from MCS Proto-typer, Hacker, Inventor, Entrepreneur, General Geek Brazilian Jiu Jitsu @Data4Bots
5
What is it?
6
What is Data Science? Descriptive Predictive Prescriptive Actuated
Its really just a new catch all phrase for analytics of every kind. I like to capture in 4 categories of what you are trying to actually do.
7
Descriptive
8
Predictive
9
Prescriptive
10
Actuated
11
Data Science in the Wild
12
Data Science Technologies
R F# C#
13
Depends on Target and Scenario
11/29/2018 2:52 PM Depends on Target and Scenario Server? Client? Embedded? Business Decisions, Robotics, Modern Apps? How much, How Fast? Are there stages? Legacy Products? Platforms? Legal Issues? Understand your customer… © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Common Scenarios Green Field Intelligent Solutions
BI w/Simple Data Sources BI w/Complex Data Sources Targets Mobile Phones Micro Services Embedded
15
Languages I Use Initial: Microsoft R Open Prod Services: F# or Python
11/29/2018 2:52 PM Languages I Use Initial: Microsoft R Open Prod Services: F# or Python Integrate: C# Devices: C++ Deployment: On Prem or Azure © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
16
Platform Technologies I Use
11/29/2018 2:52 PM Platform Technologies I Use Azure Azure Machine Learning Linux VMs Data Lake/Blob Data Platform App Services Etc SQL Power BI I use Azure very heavily. Its really a game changer for me in this field © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
17
Architectures
18
Building for the Unknown
Interfaces Cloud Message Ingestion Analytics Delivery Application Device
19
Ok, that’s the theory… https://github.com/drcrook1/BioInformatics
Under Development Btw, Project Ascend Accelerated Advanced Analytics for Data Stream Applications
20
Isn’t that just normal App Dev?
11/29/2018 2:52 PM Isn’t that just normal App Dev? Most of it, yeah, it is… But, lets think about this Value to Business Ask Audience Not Value to Business Proposal Steal good ideas and use them every where we can… Value is the intelligence, the management of information. It is Information Technology. It is Information Science. It is information that creates value. It is acting and deciding upon this intelligence which is important. Everything else is just requirements associated with delivering this value. Understanding this is what allows us to focus on how to quickly and easily deliver value to the business. Stealing good ideas -> Steal interfaces from App Dev and use in Analytics. Steal Micro-Services and use in Analytics. Basically lets just steal everything from App Dev and Dev Ops and start applying it to Analytics and Data Science. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
21
11/29/2018 2:52 PM https://github.com/drcrook1/BioInformatics
© 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
Data Science Foundations
23
Data Science Needs…Data
Multiple Data Sources Need Data Prepped for Visualization Machine Learning
24
Great Data for Visualization
11/29/2018 2:52 PM Great Data for Visualization Noise Removed Completely 2 Dimensional All Values present for Display in every row (total for example) This makes integrating Data into Power BI, Tableau, Anything painless. This is obviously not how the data is stored in the transactional DB. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
25
Algorithm Development
Price = 100 * sqft + 5 * yard * modernflag Simple Linear Model 100, 5, 100, are trained weighted values Easily Representable as matrix calculations Typically Long Train, Fast Execution
26
Algorithm Development Cont.
Bias vs Variance Feature Selection & Engineering Hyper Parameters High Bias on the Left, High Variance on the right. Solving Variance or Bias Variance More Training Data Increasing Regularization Decreasing Features Decreasing Polynomials Bias Reducing Regularization Increasing Features Increasing Polynomials Not more data (necessarily) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
27
Algorithm Development Take Away
Long Complex Train Fast Execution Linear Algebra
28
Architecture Revisited
29
11/29/2018 2:52 PM Focus on Visualization & Machine Learning
© 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
30
Tools to Make this Easy Data Visualization
Power BI Embedded Python + Ubuntu + Docker Data Factory (bigger jobs) Machine Learning w/Cloud Execution Azure Machine Learning Train Cloud Hosted Micro-Service Execution Engine. Produced with the click of a button. Machine Learning w/Client Execution CNTK or Tensor Flow – Train Local or Cloud Produce Model and reconstruct on client. This can get complicated
31
Summary What is Data Science Bio-Informatics Architecture & Ambiguity
11/29/2018 2:52 PM Summary What is Data Science Bio-Informatics Ascend+ Architecture & Ambiguity Prep for Visualization Algorithm Development Overview Data Science in Production Ensure you ask folks for Ascend+ leads © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
32
Questions
33
@Data4Bots www.DaCrook.com 11/29/2018 2:52 PM
© 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.