COMP 4332 Tutorial 1 Feb 16 WANG YUE Tutorial Overview & Learning Python.

Slides:



Advertisements
Similar presentations
Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Advertisements

Network Design and Optimization Python Introduction
Python Lab Proteomics Informatics, Spring 2014 Week 1 28 th Jan, 2014 Himanshu Grover
Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
GNE: Global NEWS Modeling Environment What it is / what it’s not How it works Components, samples Installation NEWS 2 Implementation Emilio MayorgaMay.
Android Application Development Stephen Diniz Computer/Electrical Engineer Lecture 01 Introduction.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
JCE A Java-based Commissioning Environment tool Hiroyuki Sako, JAEA Hiroshi Ikeda, Visible Information Center Inc. SAD Workshop.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Introduction to Python By Neil Cook Twitter: njcuk Slides/Notes:
Introduction to Programming Workshop 1 PHYS1101 Discovery Skills in Physics Dr. Nigel Dipper Room 125d
Appendix: The WEKA Data Mining Software
Python 0 Some material adapted from Upenn cmpe391 slides and other sources.
October 5, 2015 Pretty Programming and Packaging with Python Fedor Baart, Genna Donchyts, Hessel Winsemius Slides and course material will be made available.
Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.
CS 105 Perl: Course Introduction Nathan Clement 13 May 2014.
1 3. Computing System Fundamentals 3.1 Language Translators.
FLUKA GUI Status FLUKA Meeting CERN, 10/7/2006.
9/2/ CS171 -Math & Computer Science Department at Emory University.
Development Tools © Copyright 2014, Fred McClurg All Rights Reserved.
C++ and Ubuntu Linux Review and Practice CS 244 Brent M. Dingle, Ph.D. Game Design and Development Program Department of Mathematics, Statistics, and.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.
Installing and Developing Programs in Python. Installing Python is pre-installed on most Unix systems, including Linux and MAC OS X The pre-installed.
Python for: Data Science. Python  Python is an open source scripting language.  Developed by Guido Van Rossum in late 1980s  Named after Monty Python.
DATA MINING Pandas. Python Data Analysis Library A library for data analysis of (mostly) tabular data Gives capabilities similar to Excel and SQL but.
CIS 601 Fall 2003 Introduction to MATLAB Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Getting Started With Python Brendan Routledge
Zohreh Raghebi.  A software platform provides an integrated environment  Machine learning  Data mining  Text mining  Predictive analytics  Business.
7/8/2016 OAF Jean-Jacques Gras Stephen Jackson Blazej Kolad 1.
Python Scripting for Computational Science CPS 5401 Fall 2014 Shirley Moore, Instructor October 6,
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Building Machine Learning System with Python
Matlab.
PH2150 Scientific Computing Skills
Basic machine learning background with Python scikit-learn
Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates
Prepared by Kimberly Sayre and Jinbo Bi
Introduction to MATLAB
Lecture 1: Introduction
Python for Scientific Computing
Introduction to Python
Brief Intro to Python for Statistics
MATLAB – What Is It ? Name is from matrix laboratory Powerful tool for
MATLAB – What Is It ? Name is from matrix laboratory Powerful tool for
SEEM4570 Tutorial 1 Android SDK + XCode SDK +
Data Science with Python
EMSE 6574 – Programming for Analytics: Python 101 – Python Enviornments Joel Klein.
Option One Install Python via installing Anaconda:
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Simulation And Modeling
Predictive Models with SQL Server Machine Learning Services
Introduction to Matlab
Python for Data Analysis
Assignment 8 : logistic regression
Assignment 7 Due Application of Support Vector Machines using Weka software Must install libsvm Data set: Breast cancer diagnostics Deliverables:
DATA MINING Python.
Igor Stančin, Alan Jović to: {igor.stancin,
Introduction to Python
Presentation transcript:

COMP 4332 Tutorial 1 Feb 16 WANG YUE Tutorial Overview & Learning Python

Project-oriented tutorials Project and assignments count for 80% of your grade. You will write code in a few languages/tools. More importantly, you will do experiments! Very different from COMP4331. Light on concepts/math. Heavy hands-on course. COMP = COMP COMP 433 1

A data mining project requires Explore data and data preprocessing. 2. Trying algorithms, SVM, Logistic Regression, Decision Trees, Dimensionality Reduction, etc... And try varying parameters in each algorithm. Labor intensive! Sometimes frustrated. 3. Summarize findings and design new methods and go back to step 2. Repeatedly go to step 1 to re- processing the data to feed into different tools. The creative part!

1. Explore data/look at the data Visualization: 1D data summary: mean, variance, median, skewness; density estimation(pdf), cdf; outliers, etc. 2D data summary: scatter plot, QQ-plot, correlation scores, etc. High-dimensional data summary: dimensionality reduction and plot to 2D or 3D Store data and extract wanted part. Organized: SQL like queries... Quick and dirty: write a script for each operation...

2. Run experiments using tools Most of the time, tools are available. Weka, libsvm, etc.. Sometimes, you need to implement a variant of existing algorithm. A different decision tree A classifier handles unbalanced data Run the methods and vary parameters and plot results and trends. Good news:) Numerical code is generally hard to write correctly (hard to DEBUG!). You will do this in this course!

3. Summarize findings and design new methods After each iteration of step 1 and 2, you know more about the data, you may have new ideas and go back to step 1 and 2. But before that, first document your findings.

A cloud of tools... Data preprocessing: Python, Java/C++, SQL, Excel, text editors.... Visualization: Excel, Matlab, R, matlibplot SVM: libsvm, svmlight, liblinear packages Logistic regression: liblinear Decision Trees & tree ensemble: Weka, FEST Matrix factorization: libfm, GraphLab

Teaching all of them is impossible You have to take time to read the manuals of these tools, and sometimes source code of them! Through this course, we will use Python to illustrate Data preprocessing (mostly its string processing) Algorithm implementation (numpy/scipy) Automaticly perform experiments Simple plotting (matlibplot) Sometimes, we use R’s plotting packages (core, ggplot2) if matlibplot does not fit the requirement.

Why Python Easy to learn and easy to use. A good tool for us to illustrate the three steps of doing a data mining project. A concise and powerful language. A glue language. Easily integrate components written in other languages. Widely used in IT industries. Organizations using PythonOrganizations using Python We would use latest python version in this course(python3.4)

Setup Python Scientific Environment Anaconda Scientific Python Distribution It includes over 195 of the most popular Python packages for science, math, engineering, data analysis. (numpy, scipy, sklearn, matplotlib) Cross Platform No need to install scientific package one by one Default IDE is weak. Recommended IDEs: Sublime Text (recommended) PyCharm (recommended) Eclipse + pydev (cross platform) Or simply Notepad++ editor with syntax highlighting (only in Windows)

Learn Python The official Python tutorial. Written for experienced programmers.official Python tutorial Read it twice and try every code snippet in the tutorial. Code Like a Pythonista: Idiomatic Python Python Howto: sort, logging, functional programming, etc. Python Howto MIT 6.00 course material. MIT 6.00 course material Liang Huang’s Python Short Course.Python Short Course numpy examples and scipy tutorial. numpy examples scipy tutorial Best place to ask a Python-related question: It is better to send your Python question to Stackoverflow rather than to our mailing list.

Learn Python (Books) A Byte of Python Learning Python Python Cookbook Moving from Python2 to Python3

Play with Python data structures basic types: bool, integer, float, complex tuple: (x, y,..) list: [x, y,...] string: ‘hello’, “world” dictionary: { x: a, y: b,... } set: set([a, b, c, d]) iteratable/sequence: a unified view for data structures tuple/list/dictionary/set/string are all iteratable.

Learning By Doing 1. Go through basic Python data structures and their operations. 2. Show Python’s functions and control structures (if-then- else/for/while).