Prepared by Kimberly Sayre and Jinbo Bi

Slides:

Advertisements

Similar presentations

Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.

Advertisements

Machine Learning Homework

KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.

What Is Open Office? Open Office is an open source (free) suite of editing programs such as a word processor, spreadsheet manager, powerpoint creator,

An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.

Python From the book “Think Python”

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.

1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”

Mantid Manipulation and Analysis Toolkit for Instrument data.

Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.

PROBLEM SOLVING WARM-UP Fill in the spaces using any operation to solve the following (!, (), -/+,÷,×): = 6.

How to Get Started With Python

ACAP Online Application Process

JQuery Fundamentals Introduction Tutorial Videos

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

IBM Predictive Analytics Virtual Users’ Group Meeting March 30, 2016

Development Environment

CST 1101 Problem Solving Using Computers

Weebly Elements, Continued

Building Machine Learning System with Python

MET4750 Techniques for Earth System Modeling

CSC391/691 Intro to OpenCV Dr. Rongzhong Li Fall 2016

Introduction to Eclipse

The Smarter Balanced Assessment Consortium

Appendix A Barb Ericson Georgia Institute of Technology May 2006

Single Sample Registration

R For The SQL Developer Kevin Feasel Manager, Predictive Analytics

DATA MINING Python.

Python in astronomy + Monte-Carlo techniques

Create your Benner - intro

The Smarter Balanced Assessment Consortium

Basic machine learning background with Python scikit-learn

Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates

Macrosystems EDDIE: Getting Started + Troubleshooting Tips

Do you know this browser?...

Network Visualization

Learning to Program in Python

Lab 1 Introductions to R Sean Potter.

Let's Learn Python and Pygame

The Smarter Balanced Assessment Consortium

Learning to Program in Python

TI-84 Data Fitting Tutorial Prepared for Math Link Participants By Tony Peressini and Rick Meyer Modified for TI-84 / TI-84 Plus by Tom Anderson, Feb.

UNITY TEAM PROJECT TOPICS: [1]. Unity Collaborate

Brief Intro to Python for Statistics

The Smarter Balanced Assessment Consortium

BSc in Digital Media, PSUIC

EMSE 6574 – Programming for Analytics: Python 101 – Python Enviornments Joel Klein.

Teaching slides Chapter 6.

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Tutorial 1: Python, Numpy, and AWS Tutorial

Python Crash Course CSC 576: Data Science.

Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.

TC 310 The Computer in Technical Communication

Chapter 1: Programming Basics, Python History and Program Components

The Smarter Balanced Assessment Consortium

Python for Data Analysis

Chapter 1 Introducing Small Basic

Review of Previous Lesson

Enol Fernandez & Giuseppe La Rocca EGI Foundation

Neural Networks Weka Lab

The Smarter Balanced Assessment Consortium

DATA MINING Python.

Installations for Course

Introduction to Python

Open data in teaching and education

Installations for Course

Presentation transcript:

Prepared by Kimberly Sayre and Jinbo Bi

Why Python? Programming languages like R and Python really shine when it comes to large amounts of data. They’re fast and efficient. R has more to offer in terms of data analysis and algorithm selection but some say there is a steeper learning curve. Python is a popular scientific language and a “rising star” for machine learning. There are plenty of easy to use tools and frameworks for Python with an active community. Image taken from: http://machinelearningmastery.com/best-programming-language-for-machine-learning/

What is Scikit? SciKit-learn is a framework that has simple and efficient tools for data mining and data analysis. It is accessible to everybody and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib (Python libraries) and it is open source and commercially usable. [http://scikit-learn.org/] The SciKit-learn has an AMAZING website with TONS of documentation. There are many examples of full length scripts which are easy to understand (even if you are not familiar with Python). It also has a very large library of different algorithms. You can do classification, regression, clustering, dimensionality reduction, model selection and preprocessing.

Installing Scikit Learn Scikit-learn requires: Python (>= 2.6 or >= 3.3), NumPy (>= 1.6.1), SciPy (>= 0.9). The Scikit website has very good installation instructions with different options. Check out the website [http://scikit-learn.org/stable/install.html] to find an option that works best for you. For this PowerPoint, we’ll be using an IDE called IPython which is included, along with Scikit learn, in a third-party distribution called Anaconda [https://www.continuum.io/downloads]. Anaconda ships with a recent version of scikit-learn, in addition to a large set of scientific python library for Windows, Mac OSX and Linux. *NOTE* You have a choice of downloading Anaconda that has Python 2 or Python 3. They do have some syntactic differences. This PowerPoint will be going off of Python 2.

IPython, also known as Jupyter, is a powerful interactive shell IPython, also known as Jupyter, is a powerful interactive shell. It has support for interactive data visualization and it is very flexible. It is easy to use for analytics and high performance computing. IPython notebook is included in the Anaconda package and it runs in your browser. *NOTE* A command prompt will open when you are using Ipython notebook. Keep this open. It acts as a server for your computer while you’re using IPython.

IPython Sample: Good Each block is known as a cell in which you can write one or more lines of python code. You can then click run or hit ctrl+enter to execute cell. If there is output, it will be displayed below that cell. If there is a grey border around the cell then you are in “command mode”. This allows you to create new cells above or below the current cell by hitting a (above) or b (below). You can navigate between cells using arrow keys. Hit enter again to return to “edit mode” (shown by having a green border around the cell) and now you can type code within the cell. You go back into “command mode” by hitting esc. You can save your notebook and then come back to it at a later time. You can also download other notebooks from places like github.

IPython Sample: Bad Indenting is important in Python!!! *NOTE* Indenting is VERY important in Python. IPython notebook is nice enough to highlight stuff in red if you don’t indent properly Indenting is important in Python!!!

Machine Learning Example: Iris Dataset Let’s take a look at a machine learning example with Scikit using an available dataset. We’re going to use the iris dataset. Each iris species has a strong relationship between their petal’s and sepal’s (a type of petal) length and width. The dataset includes petal length, petal width, sepal length, sepal width and the species name. You can use machine learning to predict an iris based on these attributes so it can be used for an easy supervised learning task. http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

Loading Data With IPython, you import individual modules rather than an entire packages. Here we’re importing the iris dataset and then viewing the data

Here we’re looking at the different attributes of the iris dataset

Data Requirements with Scikit Features and response are separate objects Features and response should be numeric Features and response should be NumPy arrays Features and responses should have specific shapes

Here we’re creating our feature and response matrix, loading the KNN class before we instantiate a KNN object

Fitting our model with the iris data Fitting our model with the iris data. Feel free to play around and explore! There are plenty of different aspects you can tune and try If you’d like to see the script in its entirety, check out this link [http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html]

Plotting with Scikit Scikit’s website has really good examples. If you want to take a look into plotting with KNN (or plotting with Scikit in general), check out their page which provides samples plots and completed scripts with descriptive comments The link on the previous slide gives an example of this.

Uploading Your Own Data You’re able to use your own datasets with Scikit as well. Let’s say we have a CSV file shown above (yes, this is a csv file for the iris data). *NOTE* All values are numeric

You can easily import your own data but you need to make sure you follow the requirements from slide 11 which is done in the script above. From there you can do as you like with your data. 

Resources An amazing tutorial by a guy who reminds me of Sheldon from Big Bang Theory Python tutorial Anaconda Scikit-Learn Ipython/Scikit Tutorial – https://www.youtube.com/watch?v=IsXXlYVBt1M Python Tutorial - https://www.dataquest.io/ Anaconda - https://www.continuum.io/downloads Scikit learn website - http://scikit-learn.org/stable/