Download presentation
Presentation is loading. Please wait.
Published byPatience Randall Modified over 9 years ago
1
Scaling the Data Scientist Dr. Ira Cohen, Chief Data Scientist, HP Software
2
2 Data Science Office @ HPSW HP-Software and Data Science HP-Software products collect huge amounts of IT data Customers want us to transform the data to actionable information System Monitoring Events Defects Incidents Logs Changes Configuration Test data Requirements “Big Data & Predictive Analytics: The Future of IT Management” Mike Gualtieri, Forrester Security events Network data App Monitoring
3
3 Data Science Office @ HPSW Need Expertise Expertise in machine learning Expertise in the products domain Infrastructure Data platformsDevelopment Tools
4
4 Data Science Office @ HPSW A tale of two worlds Data Scientists Few Limited domain knowledge Tools: R, Matlab, Mahout, Knime, Weka, Sas, … Developers/SMEs Plentiful Limited data science knowledge Tools: IDEs, Excel
5
5 Data Science Office @ HPSW Developer Data analytics specialist Our solution
6
6 Data Science Office @ HPSW How? Training Mentoring Community Training Mentoring Community Data infrastructure New Dev tool Data infrastructure New Dev tool
7
7 Data Science Office @ HPSW Training: Practical Machine Learning 4 day training Commitment to complete first project
8
Practical Machine Learning Ohad Assulin, Efrat Egozi Levi, Ira Cohen Automatic Event Prioritization Anat Levinger & Roy Wallerstein Automatic Vulnerability Categorization Barak Raz & Ben Feher Classifying Security Events Yoni Roit & Omer Weissman Early detection of anomalous behavior in IT systems Yonatan Ben Simhon & Yaneeve Shekel Cloud Delivery Optimization (CDO) Ran, Levi URL to Action Classification Boaz Shor & Eyal Kenigsberg Predictive Analytics in Release Management Sigalit Sade Sales Pipeline Early Warning Gabriel, Alvarado
9
Pushing My Buttons Gil Zieder, Ofer Eliassaf, Boris Kozorovitzky
10
10 Data Science Office @ HPSW The process @ work Problem definition Data Attribute construction Normalization Processing Attribute selection Filtering Supervised Classification Learning Minimize false negatives Testing 9 open source projects, 8806 individual commits Get labels of “good” or “bad” commit by running tests after each commit “good” – tests pass, “bad” – tests fail 9 open source projects, 8806 individual commits Get labels of “good” or “bad” commit by running tests after each commit “good” – tests pass, “bad” – tests fail As a Pusher or DevOps of a project you would like to know if the given change set is safe to push into the production branch. As a Pusher or DevOps of a project you would like to know if the given change set is safe to push into the production branch. 80 attributes per commit source control, previous commits, and code complexity based attributes: e.g., average change frequency, previous commit state, cyclomatic complexity 80 attributes per commit source control, previous commits, and code complexity based attributes: e.g., average change frequency, previous commit state, cyclomatic complexity Rank based attribute selection Classification algorithms K-NN, SVM, Decision Tree, Random Forest, … Classification algorithms K-NN, SVM, Decision Tree, Random Forest, … 87% Accuracy with K-NN
11
11 Data Science Office @ HPSW Analytic specialist program: Results > 70 developers trained Before: 4 > 30 new capabilities since April 2013 Before: 1 1 Data scientist per 10 new capabilities Before: 1:1 Development time reduced by 70% Before: 12 months
12
12 Data Science Office @ HPSW Can we do better? Yes. From months to days! How? – Create a simple tool for analytic specialists – Automate the data scientist as much as possible
13
13 Data Science Office @ HPSW Project Titan
14
14 Data Science Office @ HPSW Titan: Demo
15
15 Data Science Office @ HPSW Scaling the data scientist Analytic specialists Develops using standard machine learning Uses simplified tool Data Scientist Provides expert advice Develops new types of machine learning solutions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.