Machine Learning with Databricks SQL Saturday Madison 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
Thanks to our GOLD Sponsor
Thanks to our SILVER Sponsors
© 2018 TALAVANT. All Rights Reserved. Introductions Daniel Woods Daniel.woods@talavant.com Senior BI Consultant @ Talavant Working with BI since 2015; Started as a BI Developer Transitioned to Consulting in 2018 Interested in the marriage between BI and Mathematics 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. Agenda Discuss Machine Learning and it’s uses within a business Overview of Databricks and it’s role with Machine Learning Demo Overview Demo Debrief Wrap-up 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. Machine Learning Machine Learning (ML) is an application of artificial intelligence (AI) that provides systems the ability to learn and improve from experience There are 4 main types of Machine Learning: Supervised unsupervised, semi-supervised and Reinforcement In business, ML can be used to identify and act upon trends in data that lead to better outcomes For example, ML can be used to predict customer churn 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
Where Databricks comes in Databricks is an online, collaborative notebook space that runs with Apache Spark. Although it is an independent product, Microsoft Azure incorporates it into it’s solution space, making it easily accessible to any of it’s other cloud services Databricks can run Python, SQL, R or Scala, so it has great flexibility when it comes to Data Science and Engineering 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. Why Databricks? Databricks distributes many tasks across worker nodes that can help speed-up long processes. This environment is ideal for Machine Learning programs, since we’re often looking at larger data sets and running programs that require a good amount of compute power and memory 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. Demo Overview 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. The Dataset Telecom Customer Churn data Gathered from Kaggle Keep an eye on the dataset properties 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
Developing a Churn Prediction Model Mount data from Azure Blob Storage Exploratory Data Analysis Reduce the Feature Set Boolean Encoding One-Hot Encoding Vector Assembly Model Development Save the Model 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. Demo 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
Machine Learning and Databricks Despite a small data set, we see how we can still develop a model that can be used to predict customer churn However, the F1-score of the GBT model ( as well as the others) is not ideal, and circles back to the quality of the data set When looking to implement machine learning, the most important part is ensuring that your company is prepared for the journey 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
© 2018 TALAVANT. All Rights Reserved. Thank you! Daniel Woods Email: Daniel.woods@talavant.com LinkedIn: https://www.linkedin.com/in/daniel-woods-9520a0a6 If you are able to, Talavant has a survey we’d like for you to complete at https://www.talavant.com/sqlsat/ There will be a drawing for an Amazon Gift Card for those that complete the survey 5/21/2019 © 2018 TALAVANT. All Rights Reserved.