Building Machine Learning System with Python

Slides:



Advertisements
Similar presentations
Adjusting Active Basis Model by Regularized Logistic Regression
Advertisements

Clustering High Dimensional Data Using SVM
S UPPORT V ECTOR M ACHINES Jianping Fan Dept of Computer Science UNC-Charlotte.
AI Practice 05 / 07 Sang-Woo Lee. 1.Usage of SVM and Decision Tree in Weka 2.Amplification about Final Project Spec 3.SVM – State of the Art in Classification.
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Support Vector Machines
Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
An Introduction to Support Vector Machines Martin Law.
Efficient Model Selection for Support Vector Machines
Practical Machine Learning Pipelines with MLlib Joseph K. Bradley March 18, 2015 Spark Summit East 2015.
Text Classification using SVM- light DSSI 2008 Jing Jiang.
An Example of Course Project Face Identification.
Part II Support Vector Machine Algorithms. Outline  Some variants of SVM  Relevant algorithms  Usage of the algorithms.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
An Introduction to Support Vector Machines (M. Law)
Protein Fold Recognition as a Data Mining Coursework Project Badri Adhikari Department of Computer Science University of Missouri-Columbia.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
Data analysis tools Subrata Mitra and Jason Rahman.
Weka. Weka A Java-based machine vlearning tool Implements numerous classifiers and other ML algorithms Uses a common.
COMP 4332 Tutorial 1 Feb 16 WANG YUE Tutorial Overview & Learning Python.
SVMs in a Nutshell.
Support Vector Machines Optimization objective Machine Learning.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
PREDICTING SONG HOTNESS
BSP: An iterated local search heuristic for the hyperplane with minimum number of misclassifications Usman Roshan.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Support Vector Machines: Brief Overview
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Python for data analysis Prakhar Amlathe Utah State University
PREDICT 422: Practical Machine Learning
Machine Learning: Decision Trees in AIMA and WEKA
Zhenshan, Wen SVM Implementation Zhenshan, Wen
Machine Learning Logistic Regression
Erich Smith Coleman Platt
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Data Mining Tools some examples.
Customer Satisfaction Based on Voice
Basic machine learning background with Python scikit-learn
Schizophrenia Classification Using
Prepared by Kimberly Sayre and Jinbo Bi
Machine Learning Logistic Regression
CMPT 733, SPRING 2016 Jiannan Wang
Brief Intro to Python for Statistics
Machine Learning with Weka
Introduction to Deep Learning with Keras
AHED Automatic Human Emotion Detection
CAR EVALUATION SIYANG CHEN ECE 539 | Dec
Option One Install Python via installing Anaconda:
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Classification Boundaries
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Artificial Intelligence Club
CSE 802. Prepared by Martin Law
Python for Data Analysis
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
The Student’s Guide to Apache Spark
Machine Learning: Decision Trees in AIMA and WEKA
Elena Mikhalkova, Nadezhda Ganzherli, Yuri Karyakin, Dmitriy Grigoryev
Analysis on Accelerated Learning Cohorts
Manisha Panta, Avdesh Mishra, Md Tamjidul Hoque, Joel Atallah
Igor Stančin, Alan Jović to: {igor.stancin,
Machine Learning for Cyber
Machine Learning for Cyber
Presentation transcript:

Building Machine Learning System with Python 5, Sept, 2016

“Life is short you need Python” -- Bruce Eckel

NumPy, SciPy, and Matplotlib NumPy and SciPy: Highly optimized storage and operation for multidimensional arrays, which are the basis data structure of most state-of-the-art algorithms. Matplotlib: One of the most convenient library to plot high-quality graph using Python.

Scikit-learn Scikit-learn is a marvelous machine learning toolkit in Python. http://scikit-learn.org/stable/install.html Getting started with Scikit-learn: http://scikit-learn.org/stable/tutorial/basic/tutorial.html

Installing Python Install Python, NumPy, SciPy, Scikit-learning, and etc. step by step. https://www.python.org/downloads/ Install Python distribution, such as Anaconda. https://www.continuum.io/downloads For those who favors an IDE, PyCharm is a powerful IDE for Python and scientific development. Recommended P.S. For consistency, we use Python 2.7.

References Python programming: Python official tutorial: https://docs.python.org/2/tutorial/index.html Stackoverflow! Machine Learning: Building Machine Learning Systems with Python. Willi Richert, Luis Pedro Coelho: library.ust.hk Cross Validated!

The first machine learning application

Dataset UC Irvine (UCI) Machine Learning Repository Iris dataset: Classify the flowers’ species using the following features. Sepal length. Sepal width. Petal length. Petal width.

Data Visualization

Data Preprocessing

Learning Logistic Regression Model Training Accuracy: 0.726667

Cross Validation Training Testing Train-test split 5-Fold Cross Validation Training Testing

0 Fold Train Accuracy:0.716667, Test Accuracy:0.733333 1 Fold Train Accuracy:0.766667, Test Accuracy:0.633333 2 Fold Train Accuracy:0.783333, Test Accuracy:0.566667 3 Fold Train Accuracy:0.691667, Test Accuracy:0.866667 4 Fold Train Accuracy:0.741667,

From Logistic Regression to Support Vector Machine 0 Fold Train Accuracy:0.975000, Test Accuracy:0.966667 1 Fold Train Accuracy:0.966667, 2 Fold Train Accuracy:0.966667, 3 Fold Train Accuracy:0.983333, Test Accuracy:0.933333 4 Fold Train Accuracy:0.966667, Test Accuracy:1.000000 P.S.: from sklearn.svm import SVC

Parameter Tuning NEVER USE YOUR TEST DATA FOR TUNING sklearn.svm.SVC C: Penalty parameter. Kernel: rbf, poly, linear. Degree: for polynomial kernel. Gamma: for rbf kernel. Etc. LogisticRegression C: Penalty parameter. penalty: l1 or l2. fit_intercept: solve: lbfgs, lblinear, netwon-cg. Etc. NEVER USE YOUR TEST DATA FOR TUNING

Resources LIBSVM and LIBLINEAR SVMlight Vowpal Wabbit Chih-Jen Lin, National Taiwan University. Simple and easy-to-use support vector machines tool. Hsu, C.W., Chang, C.C. and Lin, C.J., 2003. A practical guide to support vector. classification. https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf SVMlight Thorsten Joachims, Cornell University. An implementation of Support Vector Machines (SVMs) in C. Vowpal Wabbit Microsoft Research and (previously) Yahoo! Research Fast and scalable tool for learning linear model. Mahout on Hadoop. MlLib on Spark. Petuum.

Play with more UCI datasets. archive.ics.uci.edu/ml Play with Tensorflow playground. playground.tensorflow.org

Thank you!