Machine Learning with Spark MLlib

Slides:

Advertisements

Similar presentations

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Unsupervised Learning

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

An Overview of Machine Learning

Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:

CIS 678 Artificial Intelligence problems deduction, reasoning knowledge representation planning learning natural language processing motion and manipulation.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Introduction to machine learning

Overview DM for Business Intelligence.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Data Mining Chun-Hung Chou

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.

Basic Data Mining Technique

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,

Applying Neural Networks Michael J. Watts

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

Predicting Voice Elicited Emotions

Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.

Data Mining and Decision Support

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Big data classification using neural network

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining ICCM

SNS COLLEGE OF TECHNOLOGY

Who am I? Work in Probabilistic Machine Learning Like to teach 

Machine Learning for Computer Security

Data-intensive Computing Algorithms: Classification

Chapter 7. Classification and Prediction

Deep Feedforward Networks

Predicting Azure Consumption using Ensemble Learning

Applying Neural Networks

Introduction to Machine Learning and Tree Based Methods

DATA MINING © Prentice Hall.

Chapter 6 Classification and Prediction

Supervised Time Series Pattern Discovery through Local Importance

Estimating Link Signatures with Machine Learning Algorithms

Vincent Granville, Ph.D. Co-Founder, DSC

Classification and Prediction

Introduction to Azure Machine Learning Studio

Machine Learning & Data Science

What is Pattern Recognition?

Machine Learning Week 1.

Intro to Machine Learning

Interval Estimation.

A Unifying View on Instance Selection

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.

Classification and Prediction

CSCI N317 Computation for Scientific Applications Unit Weka

Department of Electrical Engineering

MIS2502: Data Analytics Clustering and Segmentation

MIS2502: Data Analytics Clustering and Segmentation

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Intro to Machine Learning

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Introduction to Sensor Interpretation

©Jiawei Han and Micheline Kamber

Introduction to Sensor Interpretation

Machine Learning for Space Systems: Are We Ready?

Machine Learning in Business John C. Hull

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Machine Learning with Spark MLlib Manuel Martín Márquez Antonio Romero Marin Joeri Hermans Hadoop Tutorials

Machine Learning (ML) ML is a branch of artificial intelligence: Uses computing based systems to make sense out of data Extracting patterns, fitting data to functions, classifying data, etc ML systems can learn and improve With historical data, time and experience Bridges theoretical computer science and real noise data.

ML in real-life

Supervised and Unsupervised Learning There are not predefined and known set of outcomes Look for hidden patterns and relations in the data A typical example: Clustering ML methods fall into two learning types Unsupervised Suppose you want to segment your customers into general categories of people with similar buying patterns.

Supervised and Unsupervised Learning For every example in the data there is always a predefined outcome Models the relations between a set of descriptive features and a target (Fits data to a function) 2 groups of problems: Classification Regression More formally fits data to a function or a function approximation

Supervised Learning Classification Regression Predicts which class a given sample of data (sample of descriptive features) is part of (discrete value). Regression Predicts continuous values. More formally fits data to a function or a function approximation

Machine Learning as a Process - Define measurable and quantifiable goals - Use this stage to learn about the problem Define Objectives Data Preparation Model Building Model Evaluation Model Deployment - Normalization - Transformation - Missing Values - Outliers Study models accuracy - Work better than the naïve approach or previous system Do the results make sense in the context of the problem - Data Splitting Features Engineering Estimating Performance Evaluation and Model Selection More formally fits data to a function or a function Adding Roles

ML as a Process: Data Preparation Needed for several reasons Some Models have strict data requirements Scale of the data, data point intervals, etc Some characteristics of the data may impact dramatically on the model performance Time on data preparation should not be underestimated Missing Values Error Values Different Scales Dimensionality Types Problems Many others Raw Data Scaling Centering Skewness Outliers Errors Data Transformation Modeling phase Data Ready Add Examples

ML as a Process: Feature engineering Determine the predictors (features) to be used is one of the most critical questions Some times we need to add predictors Reduce Number: Fewer predictors more interpretable model and less costly Most of the models are affected by high dimensionality, specially for non-informative predictors Binning predictors Wrappers Multiple models adding and removing parameter Algorithms that use models as input and performance as output Genetics Algorithms Filters Evaluate the relevance of the predictor Based normally on correlations Random Forest (tree based) MARS and LASSO internally perform predictor selection Add Examples

ML as a Process: Model Building Data Splitting Allocate data to different tasks model training performance evaluation Define Training, Validation and Test sets Feature Selection (Review the decision made previously) Estimating Performance Visualization of results – discovery interesting areas of the problem space Statistics and performance measures Evaluation and Model selection The ‘no free lunch’ theorem no a priory assumptions can be made Avoid use of favorite models if NEEDED there is no one single model that will works better than any other a priory