Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Data Science: Lecture 1

Similar presentations


Presentation on theme: "Introduction to Data Science: Lecture 1"— Presentation transcript:

1 Introduction to Data Science: Lecture 1
March 15, 2017 Introduction to Data Science: Lecture 1 Dr. Amitai Armon

2 Administrative Details
Course lecturer: Prof. Tova Milo Course teaching assistant: Slava Novgorodov Grade structure: 30% Exercises 70% Final Exam Course website:

3 Course Topics This course will provide a practical introduction to machine-learning and big data Main topics of the classes: Introduction to Machine Learning Data understanding and Data Preparation Feature Selection and Model Evaluation Supervised Modeling Unsupervised Modeling Deep Learning Introduction to Big Data Spark NoSQL databases Spark Streaming

4 Exercises There will be four exercises during the course
The last exercise will be bigger Exercises will be in Python Submission is in pairs See the course website:

5 Administrative Details
Questions?

6 Intel AdvanceD Analytics: A Little about US
OUR MISSION Use data science for upgrading Intel’s operations Help Intel win the data-science market Operational Excellence Technology Breakthrough Design Manufacturing Marketing & Sales Deep Learning Products Health wearables platform

7 Intel AdvanceD Analytics: A Little about US
OUR MISSION Use data science for upgrading Intel’s operations Help Intel win the data-science market Data Science Summit conferencesi n San Francisco and Jerusalem CONTRIBUTIONS TO Data-Science Community Industry collaborations Helping Intel VC Investments Academy collaborations

8 Machine Learning is Everywhere…
Handwriting Recognition Speech Recognition Automatic translation Credit-card fraud detection Image Classification Social Networks Analysis (community detection) Movie / product / article recommendations Autonomous cars ….

9 Winning in Jeopardy

10 Winning Against Go Champion

11 Answering Visual Questions
Kan et al., 2015

12 Dialogue (“Turing Test”)
Google chatbot, 2015

13 What is Machine Learning?
Wikipedia: Machine Learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. Tom Mitchell (1998): A computer program is said to learn from experience, with respect to some task and some performance measure if its performance, as measured by the performance measure, improves with experience.

14 “Child Learning” Action Reaction Lesson Touching hot stove aching hand
Do not touch again Playing with toys Fun Continue playing Running in to the road Screaming parent Don’t run to roads Running in the house Run in the house Eating chocolate Search for chocolate Eating too much chocolate Stomach ache Don’t eat too much Saying bla bla No Reaction Try variations Saying daddy Overexcited parents Do that again

15 Learning from Examples
What is “Dangerous”?

16 Learning from Examples
So are these items dangerous or not? It’s important to have enough diverse examples, not all ‘same type’

17 Typical Machine Learning Tasks
No two machine learning tasks are identical, but still there are common prototypes: Supervised Learning Learning from labeled examples (for which the answer is known) Unsupervised Learning Learning from unlabeled examples (for which the answer is unknown) Semi-supervised Learning Learning from both labeled and unlabeled examples Active Learning Learning while interactively querying for labels of examples Reinforcement Learning Learning by trial and feedback, like the “child learning” example

18 Typical Machine Learning Tasks
Supervised Learning Estimate an unknown result, given explicit values of some explaining variables (“features”). Estimate it based on a set of observations for which both the result and the explaining variables are known (“training set”). This may be prediction (“it’s difficult to give forecasts, especially about the future”) or estimation.

19 Typical Machine Learning Tasks
Supervised Learning Example 1: What will be the annual spend of my clients? The unknown result: the annual spend (this is a prediction) Explaining variables (“features”): Client’s details (e.g., domain, size, purchase history) Training set: The annual spend in past years, with respect to the client’s data available so far (at the beginning of that year)

20 Typical Machine Learning Tasks
Supervised Learning Example 2: What is the activity currently performed by a Parkinson’s patient? The unknown result: the activity (this is not a prediction – the fact exists, we simply don’t know it) Explaining variables: Various features that are extracted from sensory data on the patient’s body (accelerometers, gyro, compass) Training set: features and the corresponding activity labeling (we must have a labeled training set)

21 Typical Machine Learning Tasks
Supervised Learning Two main tasks are considered in Supervised Learning: Regression: the unknown result is a numerical value (e.g., annual spend) Classification: the unknown result is a class relation (e.g., the activity) Regression and classification have different objective measures, and often different algorithms.

22 Typical Machine Learning Tasks
Unsupervised Learning Given explicit values of some variables (pre-defined set), extract interesting patterns that appear in the data, or provide an insightful representation of the data inherent distribution.

23 Typical Machine Learning Tasks
Unsupervised Learning Example: Market Segmentation Input data: Clients information Objective: Identify what types of clients are there? This objective is known as ‘Clustering’ or ‘Cluster Analysis‘

24 Typical Machine Learning Tasks
Reinforcement Learning Reinforcement Learning is learning how to best react to situations through trial and error. In some sense reinforcement learning is the first way of learning we think of. Example: TD Gammon

25 Few Supervised Learning Approaches

26 Supervised Learning X1 X2 X3 … Xn-2 Xn-1 Xn Y x1,1 x2,1 x3,1 xn-2,1
. x1,m-1 x2,m-1 x3,m-1 xn-2,m-1 xn-1,m-1 xn,m-1 ym-1 x1,m x2,m x3,m xn-2,m xn-1,m xn,m ym Uses a set of labeled examples with known answer (“training set”) Success is evaluated on a separate set of examples (“test set”). Various success criteria may be considered: For classification: Accuracy, Recall, Precision… For regression: MSE, RMSE,…

27 Lazy Learner: k-Nearest Neighbors
Identifying spam s What should be k? Which distance measure should be used? Computation K=3 Length New Recipients

28 Linear Classifiers How would you classify this data? X1 X2

29 Linear Classifiers How would you classify this data? X1 X2

30 Linear Classifiers X1 X2 Any of these would be fine..
..but which is best? X1 X2

31 Maximum Margin Email Length New Recipients
Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a data point. Length New Recipients

32 Maximum Margin Email Length New Recipients
The maximum margin linear classifier is the linear classifier with the maximum margin. This is found by the SVM algorithm (Support Vector Machine) Length New Recipients

33 Decision tree A flow-chart-like tree structure
Internal node denotes a test on one of the features Branch represents an outcome of the test Leaf nodes represent class labels

34 DEEP NEURAL NETWORKS Bengio, 2009

35 Block Diagram of a Supervised Learning System
Hypothesis Space Training Set Learning Alg. h Estimated εg(h) Testing h(x)≠ct(x) Test Set

36 Evaluating What’s Been Learned
Test set 2. Cross Validation Confusion Matrix Classified As Red Blue 1 7 5 Actual

37 Regression Learning Example

38 Overfitting and Underfitting
Overfitting: The model learns the training set too well – it over fits the training set such that it cannot generalize to new instances. Underfitting: the model is too simple, both training and test errors are large

39 CRISP-DM Methodology CRISP-DM stands for Cross Industry Standard Process for Data Mining Conceived in by SPSS, Teradata, Daimler, NCR and OHRA IBM is the primary corporation that embraced and incorporated it in its SPSS Modeler product CRISP-DM defines a methodology for ML/DM projects

40 CRISP-DM Methodology CRISP-DM breaks the process of data mining into six major phases Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment The sequence of the phases is not strict and moving back and forth between different phases may be required

41 Summary We briefly discussed today: What is Machine Learning
Typical Machine Learning tasks Supervised Learning: Learning means Generalization Overfitting and Underfitting Simple learning paradigms Training vs. Testing Classification and Regression CRISP-DM

42 Introduction to Data Science Questions?

43 Thank you!


Download ppt "Introduction to Data Science: Lecture 1"

Similar presentations


Ads by Google