Prepared by Kimberly Sayre and Jinbo Bi

Slides:



Advertisements
Similar presentations
Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Advertisements

Machine Learning Homework
KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
What Is Open Office? Open Office is an open source (free) suite of editing programs such as a word processor, spreadsheet manager, powerpoint creator,
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Python From the book “Think Python”
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”
Mantid Manipulation and Analysis Toolkit for Instrument data.
Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.
PROBLEM SOLVING WARM-UP Fill in the spaces using any operation to solve the following (!, (), -/+,÷,×): = 6.
How to Get Started With Python
ACAP Online Application Process
JQuery Fundamentals Introduction Tutorial Videos
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
IBM Predictive Analytics Virtual Users’ Group Meeting March 30, 2016
Development Environment
CST 1101 Problem Solving Using Computers
Weebly Elements, Continued
Building Machine Learning System with Python
MET4750 Techniques for Earth System Modeling
CSC391/691 Intro to OpenCV Dr. Rongzhong Li Fall 2016
Introduction to Eclipse
The Smarter Balanced Assessment Consortium
Appendix A Barb Ericson Georgia Institute of Technology May 2006
Single Sample Registration
R For The SQL Developer Kevin Feasel Manager, Predictive Analytics
DATA MINING Python.
Python in astronomy + Monte-Carlo techniques
Create your Benner - intro
The Smarter Balanced Assessment Consortium
Basic machine learning background with Python scikit-learn
Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Do you know this browser?...
Network Visualization
Learning to Program in Python
Lab 1 Introductions to R Sean Potter.
Let's Learn Python and Pygame
The Smarter Balanced Assessment Consortium
Learning to Program in Python
TI-84 Data Fitting Tutorial Prepared for Math Link Participants By Tony Peressini and Rick Meyer Modified for TI-84 / TI-84 Plus by Tom Anderson, Feb.
UNITY TEAM PROJECT TOPICS: [1]. Unity Collaborate
Brief Intro to Python for Statistics
The Smarter Balanced Assessment Consortium
BSc in Digital Media, PSUIC
EMSE 6574 – Programming for Analytics: Python 101 – Python Enviornments Joel Klein.
Teaching slides Chapter 6.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Tutorial 1: Python, Numpy, and AWS Tutorial
Python Crash Course CSC 576: Data Science.
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
TC 310 The Computer in Technical Communication
Chapter 1: Programming Basics, Python History and Program Components
The Smarter Balanced Assessment Consortium
Python for Data Analysis
Chapter 1 Introducing Small Basic
Review of Previous Lesson
Enol Fernandez & Giuseppe La Rocca EGI Foundation
Neural Networks Weka Lab
The Smarter Balanced Assessment Consortium
DATA MINING Python.
Installations for Course
Introduction to Python
Open data in teaching and education
Installations for Course
Presentation transcript:

Prepared by Kimberly Sayre and Jinbo Bi

Why Python? Programming languages like R and Python really shine when it comes to large amounts of data. They’re fast and efficient. R has more to offer in terms of data analysis and algorithm selection but some say there is a steeper learning curve. Python is a popular scientific language and a “rising star” for machine learning. There are plenty of easy to use tools and frameworks for Python with an active community. Image taken from: http://machinelearningmastery.com/best-programming-language-for-machine-learning/

What is Scikit? SciKit-learn is a framework that has simple and efficient tools for data mining and data analysis. It is accessible to everybody and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib (Python libraries) and it is open source and commercially usable. [http://scikit-learn.org/] The SciKit-learn has an AMAZING website with TONS of documentation. There are many examples of full length scripts which are easy to understand (even if you are not familiar with Python). It also has a very large library of different algorithms. You can do classification, regression, clustering, dimensionality reduction, model selection and preprocessing.

Installing Scikit Learn Scikit-learn requires: Python (>= 2.6 or >= 3.3), NumPy (>= 1.6.1), SciPy (>= 0.9). The Scikit website has very good installation instructions with different options. Check out the website [http://scikit-learn.org/stable/install.html] to find an option that works best for you. For this PowerPoint, we’ll be using an IDE called IPython which is included, along with Scikit learn, in a third-party distribution called Anaconda [https://www.continuum.io/downloads]. Anaconda ships with a recent version of scikit-learn, in addition to a large set of scientific python library for Windows, Mac OSX and Linux. *NOTE* You have a choice of downloading Anaconda that has Python 2 or Python 3. They do have some syntactic differences. This PowerPoint will be going off of Python 2.

IPython, also known as Jupyter, is a powerful interactive shell IPython, also known as Jupyter, is a powerful interactive shell. It has support for interactive data visualization and it is very flexible. It is easy to use for analytics and high performance computing. IPython notebook is included in the Anaconda package and it runs in your browser. *NOTE* A command prompt will open when you are using Ipython notebook. Keep this open. It acts as a server for your computer while you’re using IPython.

IPython Sample: Good Each block is known as a cell in which you can write one or more lines of python code. You can then click run or hit ctrl+enter to execute cell. If there is output, it will be displayed below that cell. If there is a grey border around the cell then you are in “command mode”. This allows you to create new cells above or below the current cell by hitting a (above) or b (below). You can navigate between cells using arrow keys. Hit enter again to return to “edit mode” (shown by having a green border around the cell) and now you can type code within the cell. You go back into “command mode” by hitting esc. You can save your notebook and then come back to it at a later time. You can also download other notebooks from places like github.

IPython Sample: Bad Indenting is important in Python!!! *NOTE* Indenting is VERY important in Python. IPython notebook is nice enough to highlight stuff in red if you don’t indent properly Indenting is important in Python!!!

Machine Learning Example: Iris Dataset Let’s take a look at a machine learning example with Scikit using an available dataset. We’re going to use the iris dataset. Each iris species has a strong relationship between their petal’s and sepal’s (a type of petal) length and width. The dataset includes petal length, petal width, sepal length, sepal width and the species name. You can use machine learning to predict an iris based on these attributes so it can be used for an easy supervised learning task. http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

Loading Data With IPython, you import individual modules rather than an entire packages. Here we’re importing the iris dataset and then viewing the data

Here we’re looking at the different attributes of the iris dataset

Data Requirements with Scikit Features and response are separate objects Features and response should be numeric Features and response should be NumPy arrays Features and responses should have specific shapes

Here we’re creating our feature and response matrix, loading the KNN class before we instantiate a KNN object

Fitting our model with the iris data Fitting our model with the iris data. Feel free to play around and explore! There are plenty of different aspects you can tune and try If you’d like to see the script in its entirety, check out this link [http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html]

Plotting with Scikit Scikit’s website has really good examples. If you want to take a look into plotting with KNN (or plotting with Scikit in general), check out their page which provides samples plots and completed scripts with descriptive comments The link on the previous slide gives an example of this.

Uploading Your Own Data You’re able to use your own datasets with Scikit as well. Let’s say we have a CSV file shown above (yes, this is a csv file for the iris data). *NOTE* All values are numeric

You can easily import your own data but you need to make sure you follow the requirements from slide 11 which is done in the script above. From there you can do as you like with your data. 

Resources An amazing tutorial by a guy who reminds me of Sheldon from Big Bang Theory Python tutorial Anaconda Scikit-Learn Ipython/Scikit Tutorial – https://www.youtube.com/watch?v=IsXXlYVBt1M Python Tutorial - https://www.dataquest.io/ Anaconda - https://www.continuum.io/downloads Scikit learn website - http://scikit-learn.org/stable/