Summary Tel Aviv University 2017/2018 Slava Novgorodov

Slides:

Advertisements

Similar presentations

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Advertisements

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.

COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Visual Recognition Tutorial

Bayesian Learning Rong Jin.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Crash Course on Machine Learning

B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.

Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

Classification And Bayesian Learning

Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.

Machine Learning 5. Parametric Methods.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.

Machine Learning in CSC 196K

General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301

FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan

COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.

Computing with R & Bayesian Statistical Inference P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/11/2016: Lecture 02-1.

1 1)Bayes’ Theorem 2)MAP, ML Hypothesis 3)Bayes optimal & Naïve Bayes classifiers IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.

Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Introduction to Data Science: Data Modeling #1

Introduction to Data Science: Lecture 1

Machine Learning with Spark MLlib

Matt Gormley Lecture 11 October 5, 2016

Who am I? Work in Probabilistic Machine Learning Like to teach 

Recitation #4 Tel Aviv University 2016/2017 Slava Novgorodov

Introduction to Spark Streaming for Real Time data analysis

LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION

Recitation #3 Tel Aviv University 2016/2017 Slava Novgorodov

Prepared by: Mahmoud Rafeek Al-Farra

Business analytics Lessons from an undergraduate introductory course

Ch3: Model Building through Regression

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Summary Tel Aviv University 2016/2017 Slava Novgorodov

COMP61011 : Machine Learning Ensemble Models

SEEM5770/ECLT5840 Course Review

DATA ANALYTICS AND TEXT MINING

Our Data Science Roadmap

Prepared by: Mahmoud Rafeek Al-Farra

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Revision (Part II) Ke Chen

Overview of big data tools

Classification and Prediction

Introduction to Data Science Lesson 1

INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA

Prepared by: Mahmoud Rafeek Al-Farra

Recitation #2 Tel Aviv University 2017/2018 Slava Novgorodov

LECTURE 07: BAYESIAN ESTIMATION

Recitation #2 Tel Aviv University 2016/2017 Slava Novgorodov

Recitation #1 Tel Aviv University 2017/2018 Slava Novgorodov

Recitation #4 Tel Aviv University 2017/2018 Slava Novgorodov

Multivariate Methods Berlin Chen

Computational Thinking

Recitation #1 Tel Aviv University 2016/2017 Slava Novgorodov

Our Data Science Roadmap

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Midterm Exam Review.

Igor Stančin, Alan Jović to: {igor.stancin,

About Data Analysis.

Machine Learning for Cyber

Lecturer: Geoff Hulten TAs: Alon Milchgrub, Andrew Wei

Presentation transcript:

Summary Tel Aviv University 2017/2018 Slava Novgorodov Intro to Data Science Summary Tel Aviv University 2017/2018 Slava Novgorodov

Today’s lesson Introduction to Data Science: Recall of course topics Exam structure Sample questions

Course Topics Machine Learning: Big Data Intro to ML Data understanding and preparation Feature selection, model evaluation Supervised/Unsupervised learning Big Data Intro to Big Data architectures MapReduce Basic SQL and SQL over MapReduce Hadoop, HDFS Spark

Where we are Preparation Deployment Modeling Evaluation Business Understanding Data Preparation Modeling Evaluation Deployment

Handling missing data: removing it Ignore the feature Pro: Simple, typically not biased Con: May be a very useful feature Ignore the sample Pro: Simple, all features are kept Con: Removed samples may be biased Con: Data may become small Intel – Advanced Analytics

Data imputation Estimate the missing values Simple data imputation: Mean, median, mode Mean (Reliability): (5+5+2+1+3+3+1+3+3)/9 = 2.88 Median (Reliability): 1 1 2 3 3 3 3 5 5 Mode (Country): USA = 6, Japan = 3, Korea = 1. Intel – Advanced Analytics

Algorithms we touched in-depth K-Means kNN Naïve – Bayes Decision Trees Regressions SVM

Decision Trees

Decision Trees

Decision Trees

Bayesian view in a (very small) nutshell We see evidenceX, such as the CPU tests results We have Prior probabilities for having a bad CPU, e.g.: P(C=good) = 0.99; P(C=bad) = 1-0.99 = 0.01 We obtain the Likelihood: Probability of evidence, given each class, e.g.: P( X | C= good) = 0.17 We compute Posterior probabilities: Probability of class, afterseeing the evidence, e.g. P(C=good | X ) Bayes rule: , where 𝑝 𝑥 = 𝑐 𝑃 𝐶 𝑝 𝑥 𝐶 posterior likelihood prior evidence

K-Means – Recall from Recitation 2 Used for clustering of unlabeled data Example: Image compression

Learning systems Recall the 11 matchsticks problem we discussed in class on Recitation #3

Big Data Map Reduce principles, Hadoop, HDF SQL over Map Reduce General questions solved with Map Reduce Spark and differences from Hadoop

Exam Structure Two equal-points parts: ML and BigData ML: 8-10 closed/short open questions BigData: 4-5 open questions Sample questions: in class…

Questions?