Scikit-learn: Machine learning in Python

Slides:



Advertisements
Similar presentations
Scaling Multivariate Statistics to Massive Data Algorithmic problems and approaches Alexander Gray Georgia Institute of Technology
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
scikit-learn Machine Learning in Python Vandana Bachani
An Overview of Machine Learning
Supervised Learning Recap
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
UCI KDD Archive University of California at Irvine –
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Machine Learning in R and its use in the statistical offices
Sparse vs. Ensemble Approaches to Supervised Learning
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Wei Zhang Akshat Surve Xiaoli Fern Thomas Dietterich.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Data analysis tools Subrata Mitra and Jason Rahman.
Logistic Regression & Elastic Net
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Experience Report: System Log Analysis for Anomaly Detection
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Introduction to Machine Learning
Machine Learning Models
ECE 471/571 - Lecture 19 Review 02/24/17.
Action-Grounded Push Affordance Bootstrapping of Unknown Objects
Oral Presentation Applied Machine Learning Course YOUR NAME
Data Mining 101 with Scikit-Learn
The Elements of Statistical Learning
Introduction to Data Science Lecture 7 Machine Learning Overview
Mixture of SVMs for Face Class Modeling
Basic machine learning background with Python scikit-learn
Machine Learning Charan Puvvala.
Waikato Environment for Knowledge Analysis
Machine Learning Week 1.
Classifying enterprises by economic activity
CMPT 733, SPRING 2016 Jiannan Wang
ECE 471/571 – Review 1.
An update on scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Dr. Sampath Jayarathna Cal Poly Pomona
Machine Learning – a Probabilistic Perspective
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
The Student’s Guide to Apache Spark
FOUNDATIONS OF BUSINESS ANALYTICS Introduction to Machine Learning
What is Artificial Intelligence?
Machine Learning for Cyber
Presentation transcript:

Scikit-learn: Machine learning in Python Brian Holt

Project goals and vision Machine learning for applications Ease of use Light and easy to install package A general-purpose high level language: Python High standards State-of-the-art algorithms High quality bindings: performance and fine control Open Source BSD license Community driven

API and design 1 Design principles Code sample Minimise number of object interfaces Build abstractions for recurrant usecases Simplicity, Simplicity, Simplicity (no framework, no pipelines, no dataset objects) Code sample >>> from sklearn import svm >>> classifier = svm.SVC() >>> classifier.fit(X_train, y_train) >>> y_pred = clf.predict(X_test)

API and design 2 All objects Classification, regression, clustering estimator.fit(X_train, y_train) Classification, regression, clustering y_pred = estimator.predict(X_test) Filters, dimension reduction, latent variables X_new = estimator.transform(X_test) Predictive models, density estimation test_score = estimator.score(X_test) One day: On-line learning estimator.refit(X_train, y_train)

Main features and algorithms Supervised learning (classification and regression) SVM - high quality libsvm bindings Generalised linear models Least squares, ridge regression, Orthogonal matching pursuit... Nearest Neighbours KDTree, BallTree Gaussian Processes Decision tree (CART) Coming soon: Ensembles (boosting, bagging)

Main features and algorithms cont. Unsupervised learning Gaussian mixture models Manifold learning Clustering Model selection Cross-Validation Grid Search

Demos SVM classifier on iris dataset Cross validation PCA Kmeans Compare ML algorithms

Conclusion Scikit-learn Well-suited for applications Machine learning without learning the machinery Simplicity and readability Optimise only the critical parts Code review Unit testing Coding standards Well-suited for applications Used for large datasets Building blocks for application-specific algorithms