Download presentation
Presentation is loading. Please wait.
Published byNóra Törökné Modified over 5 years ago
1
Building a High Performing Data Science Organisation
Welcome Amazing to be here Great Community in Dublin < About Me
2
John Hoegger Principal Data Scientist Manager Microsoft 365 Me
Seattle, 20 years, England DS for M365 – What is M365? < Topics
3
What we will cover Data Science at Scale Data is Messy
Prioritizing for Impact Today Share How we organize How we deal with data More than we can do < Numbers
4
155 32 500 Office 365 Consumer Office 365 Commercial Word for Android
(Released June 2015) MILLION MILLION MILLION ACTIVE USERS ACTIVE USERS DOWNLOADS What is scale Big numbers < Org Scale
5
16,000 Person Engineering Organization Relative size
Company – Field, engineering, Marketing < Journey
6
1988 – 1998 First 10 years Office = Clients No Data 1985 1990 1995
2000 2005 2010 2015 2020 1988 – 1998 First 10 years Office = Clients No Data 80s, 90s Single Application NO DATA < 90s, 2000s
7
1998 – 2008 Second decade Office = Client/Server
1985 1990 1995 2000 2005 2010 2015 2020 1998 – 2008 Second decade Office = Client/Server Basic Crash and Usage Data App & Device Centric 90s, 2000s Client Server Crash and Usage data App and Device < 2010
8
2008 – 2018 Third decade Office = Cloud Rich Telemetry Signals
1985 1990 1995 2000 2005 2010 2015 2020 2008 – 2018 Third decade Office = Cloud Rich Telemetry Signals User Centric Office as a Service 2010s Cloud, Services Subscription User Centric Sign-in Mobile < Today
9
Today Microsoft 365 Suite Skype for Business Today Collections of app
PC, Mac, Web, Android, iOS < Centralize Microsoft 365 Suite Skype for Business
10
To centralize or decentralize - that is the question
Domain, partnership Silos, duplication 80% same, 80% different Hub and Spoke Now we measure usage, retention < Today
11
Data is Messy Messy It is not miles per gallons from an R package
Microsoft - Clean, perfect data < Today
12
Data is Guilty Until Proven Innocent
Quality has to be measured Guilty Do not trust it You have to measure quality Completeness, correctness, As a central team, we go back to the raw streams Certified datasets – before onboarding a new app or signal, validate. Complex joins, filing in gaps, inferring columns Cross validate Make slices for the stories Validate Client with SharePoint data, triangulate the data Score data providers from 1-5, they felt they should be a 3, you are a 1 until you prove You have to keep measuring and validating. It is not a one time thing Example, we stopped getting user ID for some users in Europe. The experimentation assignment logic would assign anyone without an ID into control, we saw 55% control and 45% treatment. As the users were in Europe, we also saw difference in browser types, OS, etc. < Assumptions
13
Always Make Two Assumptions
Assume the data is bad Assume your query is wrong Guilty Do not trust it IsSubscription 30 billion people < Impact
14
Prioritize for Impact Funnels allow us to determine opportunity
15
“Our mission is to empower every person and every organization on the planet to achieve more.”
16
Reactive or Proactive Projects with the biggest impact are often the ones that no one asked for Be Proactive Let data lead the way Question the impact 0.5% vs. 5% Satisfy Curiosity < Today
17
Business Impact Are we focusing on the areas with the greatest impact?
End Users >100,000,000 Partners & Admins >1,000,000 Leadership Team <10,000 Product Teams <100
18
How can we make it happen?
Are we utilizing the capabilities of the data science team? Descriptive Diagnostic Predictive Prescriptive What happened? Why it happened? What will happen? How can we make it happen?
19
How can we make it happen?
Focusing on Impact 1) Are we utilizing the capabilities of the data science team? 2) Are we focusing on the areas with the greatest impact? Business Impact End Users >100,000,000 Partners & Admins >1,000,000 Leadership Team <10,000 Product Teams <100 Descriptive Diagnostic Predictive Prescriptive What happened? Why it happened? What will happen? How can we make it happen?
20
Biggest Opportunity = Greatest Impact
Lifecycle Funnels Awareness Acquisition Onboarding Engagement Retention Biggest Opportunity = Greatest Impact Explains the ‘what’ Explains the ‘why’
21
In Closing… Data Science at Scale Data is Messy
Final points Additional ExP Hiring Difference between good and great Spark Connecting the data to the problem Understanding the reasons for the analysis Questions that make experts question their beliefs Prioritizing for Impact
22
Thank you John Hoegger
23
9/6/ :19 AM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.