Building a High Performing Data Science Organisation Welcome Amazing to be here Great Community in Dublin < About Me
John Hoegger Principal Data Scientist Manager Microsoft 365 Me Seattle, 20 years, England DS for M365 – What is M365? < Topics
What we will cover Data Science at Scale Data is Messy Prioritizing for Impact Today Share How we organize How we deal with data More than we can do < Numbers
155 32 500 Office 365 Consumer Office 365 Commercial Word for Android (Released June 2015) MILLION MILLION MILLION ACTIVE USERS ACTIVE USERS DOWNLOADS What is scale Big numbers < Org Scale
16,000 Person Engineering Organization Relative size Company – Field, engineering, Marketing < Journey
1988 – 1998 First 10 years Office = Clients No Data 1985 1990 1995 2000 2005 2010 2015 2020 1988 – 1998 First 10 years Office = Clients No Data 80s, 90s Single Application NO DATA < 90s, 2000s
1998 – 2008 Second decade Office = Client/Server 1985 1990 1995 2000 2005 2010 2015 2020 1998 – 2008 Second decade Office = Client/Server Basic Crash and Usage Data App & Device Centric 90s, 2000s Client Server Crash and Usage data App and Device < 2010
2008 – 2018 Third decade Office = Cloud Rich Telemetry Signals 1985 1990 1995 2000 2005 2010 2015 2020 2008 – 2018 Third decade Office = Cloud Rich Telemetry Signals User Centric Office as a Service 2010s Cloud, Services Subscription User Centric Sign-in Mobile < Today
Today Microsoft 365 Suite Skype for Business Today Collections of app PC, Mac, Web, Android, iOS < Centralize Microsoft 365 Suite Skype for Business
To centralize or decentralize - that is the question Domain, partnership Silos, duplication 80% same, 80% different Hub and Spoke Now we measure usage, retention < Today
Data is Messy Messy It is not miles per gallons from an R package Microsoft - Clean, perfect data < Today
Data is Guilty Until Proven Innocent Quality has to be measured Guilty Do not trust it You have to measure quality Completeness, correctness, As a central team, we go back to the raw streams Certified datasets – before onboarding a new app or signal, validate. Complex joins, filing in gaps, inferring columns Cross validate Make slices for the stories Validate Client with SharePoint data, triangulate the data Score data providers from 1-5, they felt they should be a 3, you are a 1 until you prove You have to keep measuring and validating. It is not a one time thing Example, we stopped getting user ID for some users in Europe. The experimentation assignment logic would assign anyone without an ID into control, we saw 55% control and 45% treatment. As the users were in Europe, we also saw difference in browser types, OS, etc. < Assumptions
Always Make Two Assumptions Assume the data is bad Assume your query is wrong Guilty Do not trust it IsSubscription 30 billion people < Impact
Prioritize for Impact Funnels allow us to determine opportunity
“Our mission is to empower every person and every organization on the planet to achieve more.”
Reactive or Proactive Projects with the biggest impact are often the ones that no one asked for Be Proactive Let data lead the way Question the impact 0.5% vs. 5% Satisfy Curiosity < Today
Business Impact Are we focusing on the areas with the greatest impact? End Users >100,000,000 Partners & Admins >1,000,000 Leadership Team <10,000 Product Teams <100
How can we make it happen? Are we utilizing the capabilities of the data science team? Descriptive Diagnostic Predictive Prescriptive What happened? Why it happened? What will happen? How can we make it happen?
How can we make it happen? Focusing on Impact 1) Are we utilizing the capabilities of the data science team? 2) Are we focusing on the areas with the greatest impact? Business Impact End Users >100,000,000 Partners & Admins >1,000,000 Leadership Team <10,000 Product Teams <100 Descriptive Diagnostic Predictive Prescriptive What happened? Why it happened? What will happen? How can we make it happen?
Biggest Opportunity = Greatest Impact Lifecycle Funnels Awareness Acquisition Onboarding Engagement Retention Biggest Opportunity = Greatest Impact Explains the ‘what’ Explains the ‘why’
In Closing… Data Science at Scale Data is Messy Final points Additional ExP Hiring Difference between good and great Spark Connecting the data to the problem Understanding the reasons for the analysis Questions that make experts question their beliefs Prioritizing for Impact
Thank you John Hoegger http://www.linkedin.com/in/johnhoegger/
9/6/2019 12:19 AM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.