Scikit-learn: Machine learning in Python

Slides:

Advertisements

Similar presentations

Scaling Multivariate Statistics to Massive Data Algorithmic problems and approaches Alexander Gray Georgia Institute of Technology

Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.

scikit-learn Machine Learning in Python Vandana Bachani

An Overview of Machine Learning

Supervised Learning Recap

« هو اللطیف » By : Atefe Malek. khatabi Spring 90.

COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

UCI KDD Archive University of California at Irvine –

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

Machine Learning in R and its use in the statistical offices

Sparse vs. Ensemble Approaches to Supervised Learning

Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.

Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning (2), Tree and Forest

1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.

STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015.

CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.

B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference.

Data mining and machine learning A brief introduction.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.

Wei Zhang Akshat Surve Xiaoli Fern Thomas Dietterich.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011

BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.

Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

Data analysis tools Subrata Mitra and Jason Rahman.

Logistic Regression & Elastic Net

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A

CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Experience Report: System Log Analysis for Anomaly Detection

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Introduction to Machine Learning

Machine Learning Models

ECE 471/571 - Lecture 19 Review 02/24/17.

Action-Grounded Push Affordance Bootstrapping of Unknown Objects

Oral Presentation Applied Machine Learning Course YOUR NAME

Data Mining 101 with Scikit-Learn

The Elements of Statistical Learning

Introduction to Data Science Lecture 7 Machine Learning Overview

Mixture of SVMs for Face Class Modeling

Basic machine learning background with Python scikit-learn

Machine Learning Charan Puvvala.

Waikato Environment for Knowledge Analysis

Machine Learning Week 1.

Classifying enterprises by economic activity

CMPT 733, SPRING 2016 Jiannan Wang

ECE 471/571 – Review 1.

An update on scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Dr. Sampath Jayarathna Cal Poly Pomona

Machine Learning – a Probabilistic Perspective

Derek Hoiem CS 598, Spring 2009 Jan 27, 2009

The Student’s Guide to Apache Spark

FOUNDATIONS OF BUSINESS ANALYTICS Introduction to Machine Learning

What is Artificial Intelligence?

Machine Learning for Cyber

Presentation transcript:

Scikit-learn: Machine learning in Python Brian Holt

Project goals and vision Machine learning for applications Ease of use Light and easy to install package A general-purpose high level language: Python High standards State-of-the-art algorithms High quality bindings: performance and fine control Open Source BSD license Community driven

API and design 1 Design principles Code sample Minimise number of object interfaces Build abstractions for recurrant usecases Simplicity, Simplicity, Simplicity (no framework, no pipelines, no dataset objects) Code sample >>> from sklearn import svm >>> classifier = svm.SVC() >>> classifier.fit(X_train, y_train) >>> y_pred = clf.predict(X_test)

API and design 2 All objects Classification, regression, clustering estimator.fit(X_train, y_train) Classification, regression, clustering y_pred = estimator.predict(X_test) Filters, dimension reduction, latent variables X_new = estimator.transform(X_test) Predictive models, density estimation test_score = estimator.score(X_test) One day: On-line learning estimator.refit(X_train, y_train)

Main features and algorithms Supervised learning (classification and regression) SVM - high quality libsvm bindings Generalised linear models Least squares, ridge regression, Orthogonal matching pursuit... Nearest Neighbours KDTree, BallTree Gaussian Processes Decision tree (CART) Coming soon: Ensembles (boosting, bagging)

Main features and algorithms cont. Unsupervised learning Gaussian mixture models Manifold learning Clustering Model selection Cross-Validation Grid Search

Demos SVM classifier on iris dataset Cross validation PCA Kmeans Compare ML algorithms

Conclusion Scikit-learn Well-suited for applications Machine learning without learning the machinery Simplicity and readability Optimise only the critical parts Code review Unit testing Coding standards Well-suited for applications Used for large datasets Building blocks for application-specific algorithms