Download presentation
Presentation is loading. Please wait.
1
Introduction to Data Science: Lecture 1
March 15, 2017 Introduction to Data Science: Lecture 1 Dr. Amitai Armon
2
Administrative Details
Course lecturer: Prof. Tova Milo Course teaching assistant: Slava Novgorodov Grade structure: 30% Exercises 70% Final Exam Course website:
3
Course Topics This course will provide a practical introduction to machine-learning and big data Main topics of the classes: Introduction to Machine Learning Data understanding and Data Preparation Feature Selection and Model Evaluation Supervised Modeling Unsupervised Modeling Deep Learning Introduction to Big Data Spark NoSQL databases Spark Streaming
4
Exercises There will be four exercises during the course
The last exercise will be bigger Exercises will be in Python Submission is in pairs See the course website:
5
Administrative Details
Questions?
6
Intel AdvanceD Analytics: A Little about US
OUR MISSION Use data science for upgrading Intel’s operations Help Intel win the data-science market Operational Excellence Technology Breakthrough Design Manufacturing Marketing & Sales Deep Learning Products Health wearables platform
7
Intel AdvanceD Analytics: A Little about US
OUR MISSION Use data science for upgrading Intel’s operations Help Intel win the data-science market Data Science Summit conferencesi n San Francisco and Jerusalem CONTRIBUTIONS TO Data-Science Community Industry collaborations Helping Intel VC Investments Academy collaborations
8
Machine Learning is Everywhere…
Handwriting Recognition Speech Recognition Automatic translation Credit-card fraud detection Image Classification Social Networks Analysis (community detection) Movie / product / article recommendations Autonomous cars ….
9
Winning in Jeopardy
10
Winning Against Go Champion
11
Answering Visual Questions
Kan et al., 2015
12
Dialogue (“Turing Test”)
Google chatbot, 2015
13
What is Machine Learning?
Wikipedia: Machine Learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. Tom Mitchell (1998): A computer program is said to learn from experience, with respect to some task and some performance measure if its performance, as measured by the performance measure, improves with experience.
14
“Child Learning” Action Reaction Lesson Touching hot stove aching hand
Do not touch again Playing with toys Fun Continue playing Running in to the road Screaming parent Don’t run to roads Running in the house Run in the house Eating chocolate Search for chocolate Eating too much chocolate Stomach ache Don’t eat too much Saying bla bla No Reaction Try variations Saying daddy Overexcited parents Do that again
15
Learning from Examples
What is “Dangerous”?
16
Learning from Examples
So are these items dangerous or not? It’s important to have enough diverse examples, not all ‘same type’
17
Typical Machine Learning Tasks
No two machine learning tasks are identical, but still there are common prototypes: Supervised Learning Learning from labeled examples (for which the answer is known) Unsupervised Learning Learning from unlabeled examples (for which the answer is unknown) Semi-supervised Learning Learning from both labeled and unlabeled examples Active Learning Learning while interactively querying for labels of examples Reinforcement Learning Learning by trial and feedback, like the “child learning” example
18
Typical Machine Learning Tasks
Supervised Learning Estimate an unknown result, given explicit values of some explaining variables (“features”). Estimate it based on a set of observations for which both the result and the explaining variables are known (“training set”). This may be prediction (“it’s difficult to give forecasts, especially about the future”) or estimation.
19
Typical Machine Learning Tasks
Supervised Learning Example 1: What will be the annual spend of my clients? The unknown result: the annual spend (this is a prediction) Explaining variables (“features”): Client’s details (e.g., domain, size, purchase history) Training set: The annual spend in past years, with respect to the client’s data available so far (at the beginning of that year)
20
Typical Machine Learning Tasks
Supervised Learning Example 2: What is the activity currently performed by a Parkinson’s patient? The unknown result: the activity (this is not a prediction – the fact exists, we simply don’t know it) Explaining variables: Various features that are extracted from sensory data on the patient’s body (accelerometers, gyro, compass) Training set: features and the corresponding activity labeling (we must have a labeled training set)
21
Typical Machine Learning Tasks
Supervised Learning Two main tasks are considered in Supervised Learning: Regression: the unknown result is a numerical value (e.g., annual spend) Classification: the unknown result is a class relation (e.g., the activity) Regression and classification have different objective measures, and often different algorithms.
22
Typical Machine Learning Tasks
Unsupervised Learning Given explicit values of some variables (pre-defined set), extract interesting patterns that appear in the data, or provide an insightful representation of the data inherent distribution.
23
Typical Machine Learning Tasks
Unsupervised Learning Example: Market Segmentation Input data: Clients information Objective: Identify what types of clients are there? This objective is known as ‘Clustering’ or ‘Cluster Analysis‘
24
Typical Machine Learning Tasks
Reinforcement Learning Reinforcement Learning is learning how to best react to situations through trial and error. In some sense reinforcement learning is the first way of learning we think of. Example: TD Gammon
25
Few Supervised Learning Approaches
26
Supervised Learning X1 X2 X3 … Xn-2 Xn-1 Xn Y x1,1 x2,1 x3,1 xn-2,1
. x1,m-1 x2,m-1 x3,m-1 xn-2,m-1 xn-1,m-1 xn,m-1 ym-1 x1,m x2,m x3,m xn-2,m xn-1,m xn,m ym Uses a set of labeled examples with known answer (“training set”) Success is evaluated on a separate set of examples (“test set”). Various success criteria may be considered: For classification: Accuracy, Recall, Precision… For regression: MSE, RMSE,…
27
Lazy Learner: k-Nearest Neighbors
Identifying spam s What should be k? Which distance measure should be used? Computation K=3 Length New Recipients
28
Linear Classifiers How would you classify this data? X1 X2
29
Linear Classifiers How would you classify this data? X1 X2
30
Linear Classifiers X1 X2 Any of these would be fine..
..but which is best? X1 X2
31
Maximum Margin Email Length New Recipients
Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a data point. Length New Recipients
32
Maximum Margin Email Length New Recipients
The maximum margin linear classifier is the linear classifier with the maximum margin. This is found by the SVM algorithm (Support Vector Machine) Length New Recipients
33
Decision tree A flow-chart-like tree structure
Internal node denotes a test on one of the features Branch represents an outcome of the test Leaf nodes represent class labels
34
DEEP NEURAL NETWORKS Bengio, 2009
35
Block Diagram of a Supervised Learning System
Hypothesis Space Training Set Learning Alg. h Estimated εg(h) Testing h(x)≠ct(x) Test Set
36
Evaluating What’s Been Learned
Test set 2. Cross Validation Confusion Matrix Classified As Red Blue 1 7 5 Actual
37
Regression Learning Example
38
Overfitting and Underfitting
Overfitting: The model learns the training set too well – it over fits the training set such that it cannot generalize to new instances. Underfitting: the model is too simple, both training and test errors are large
39
CRISP-DM Methodology CRISP-DM stands for Cross Industry Standard Process for Data Mining Conceived in by SPSS, Teradata, Daimler, NCR and OHRA IBM is the primary corporation that embraced and incorporated it in its SPSS Modeler product CRISP-DM defines a methodology for ML/DM projects
40
CRISP-DM Methodology CRISP-DM breaks the process of data mining into six major phases Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment The sequence of the phases is not strict and moving back and forth between different phases may be required
41
Summary We briefly discussed today: What is Machine Learning
Typical Machine Learning tasks Supervised Learning: Learning means Generalization Overfitting and Underfitting Simple learning paradigms Training vs. Testing Classification and Regression CRISP-DM
42
Introduction to Data Science Questions?
43
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.