Introduction to Machine Learning

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007.
Indian Statistical Institute Kolkata
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Sparse vs. Ensemble Approaches to Supervised Learning
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Three kinds of learning
1 Diagnosing Breast Cancer with Ensemble Strategies for a Medical Diagnostic Decision Support System David West East Carolina University Paul Mangiameli.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Ensemble Learning (2), Tree and Forest
Machine Learning CS 165B Spring 2012
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Linear Regression. Fitting Models to Data Linear Analysis Decision Trees.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
CLASSIFICATION: Ensemble Methods
Feature (Gene) Selection MethodsSample Classification Methods Gene filtering: Variance (SD/Mean) Principal Component Analysis Regression using variable.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Konstantina Christakopoulou Liang Zeng Group G21
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Data Mining and Decision Support
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Validation methods.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Combining Bagging and Random Subspaces to Create Better Ensembles
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Who am I? Work in Probabilistic Machine Learning Like to teach 
Machine Learning for Computer Security
Classification with Gene Expression Data
Introduction to Machine Learning and Tree Based Methods
Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong
Alan D. Mead, Talent Algorithms Inc. Jialin Huang, Amazon
Chapter 6 Classification and Prediction
Supervised Learning Seminar Social Media Mining University UC3M
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Basic machine learning background with Python scikit-learn
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
ECE 5424: Introduction to Machine Learning
Introduction to Data Mining, 2nd Edition by
ECE 471/571 – Lecture 12 Decision Tree.
Machine Learning & Data Science
Direct or Remotely sensed
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
Lecture 1: Introduction to Machine Learning Methods
Overview of Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification with CART
CS639: Data Management for Data Science
Ch13. Ensemble method (draft)
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Machine Learning for Cyber
Presentation transcript:

Introduction to Machine Learning

What is Machine Learning? Machine learning: using algorithms to build models and generate predictions from data Contrast to what we’ve learned in stats classes, where the user specifies a model

Why Use It? Where accurate prediction matters more than causal inference To select variables where there are many possibilities To learn about the structure of the data and new variables Robustness check

The ML Family Machine learning is a field with many different methods We’ll explore a few that have clear applications in the social sciences

Choose your own adventure!

We’ll look at: Cross-validation and parallelization (not strictly ML, just useful) CART Random forests Honorable mentions: lasso, KNN, and neural networks.

Parallelization You can split repetitive processes into batches and parcel them out to your computer’s cores Cuts down on run time and is an efficient use of computing power Many options for doing this in R; some packages make it really easy

Cross-validation Partitioning and/or resampling your data to create a model with one subset and evaluate it with another Point: limits overfitting

K-fold cross-validation We will use 10-fold cross validation We randomly split the data into 10 subsets We train the model on 9 of those and evaluate its predictive strength on the 10th

Lab Section 1: Setting things up

Classifiers CART, KNN, and random forests are all classifiers They “learn” patterns from existing data, create rules or boundaries, and then make predictions about which group a given data point belongs to They are all supervised learning: you tell the algorithm what you want and give it examples

CART: Decision Trees Classification And Regression Trees Partition the data into increasingly small segments in order to make a prediction Find optimal splits in the data The basis for random forests

Example CART

Lab Section 2: Make a CART

Random Forests Supervised ensemble method Relies on bootstrap aggregating or bagging Tree bagging: learning a decision tree on a random sample of training data Random forests adds an additional step by randomizing the variables evaluated at each node

Random Forests A random forests classifier grows a forest of classification trees The classifier randomly samples variables at the nodes of the tree; trees uncorrelated The classifier then combines the predictions Note: can also be used with regression

Random Forests At each node in each tree, the classifier finds the optimal split that best separates the remaining data into homogenous groups A split can be a number, a linear combination, or a classification This recursive partitioning process generates classification rules

Lab Section 3: Random Forests and Post-Estimation

Advanced Topics Unsupervised learning and neural nets Feature selection Doing this in Python Causal inference with machine learning

Resources Muchlinski, David, David Siroky, Jingrui He, and Matthew Kocher. "Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data." Political Analysis 24, no. 1 (2016): 87-103. (excellent and helpful replication files) Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32. Breiman, Leo. "Statistical modeling: The two cultures (with comments and a rejoinder by the author)." Statistical science 16.3 (2001): 199-231. Grimmer, Justin. "We are all social scientists now: how big data, machine learning, and causal inference work together." PS: Political Science & Politics 48.01 (2015): 80-83. Conway, Drew, and John White. Machine learning for hackers. " O'Reilly Media, Inc.", 2012. Adele Cutler helped developed random forests. Unfortunately, the articles were single author.

Free Online Classes and Tutorials DataCamp Machine Learning for Beginners: https://www.datacamp.com/courses/machine-learning-toolbox Udacity machine learning class: https://www.udacity.com/course/intro-to-machine-learning--ud120 kNN example in R: https://www.r-bloggers.com/using-knn-classifier-to-predict-whether-the-price-of-stock-will-increase/