Python for Data Analysis

Slides:



Advertisements
Similar presentations
Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Advertisements

MATLAB Presented By: Nathalie Tacconi Presented By: Nathalie Tacconi Originally Prepared By: Sheridan Saint-Michel Originally Prepared By: Sheridan Saint-Michel.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
The Matrix Market The Matrix Market is an interesting collection of matrices from a variety of applications.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Justin Sun Boston DataCon September 14, Overview Why Use Orange? Classification Tree Example Project History Architecture Widgets Demo Resources.
Modeling and Simulation of linear dynamical systems using open tools Zoltán Magyar Tomáš Starý Ladislav Szolik Ľudovít Vörös Katar ína Žáková.
Introduction of Some Useful Free Software Cheng-Han Du.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Dr. Chris Musselle – Consultant R Meets Julia Dr Chris Musselle.
INTRODUCTION FOR PERL MONGERS MATLAB. Outline 1. Matlab, what is it good for 2. Matlab’s IDE & functions 3. A few words about Maple 4. What needs to be.
Introducing the Python programming language to the SMEDG community By Rupert Osborn H&S Consultants.
Introduction to Julia: Why are we doing this to you? (Fall 2015) Steven G. Johnson, MIT Applied Math MIT classes , 18.06, ,
Computational Physics Introduction 3/30/11. Goals  Calculate solutions to physics problems  All physics problems can be formulated mathematically. 
A Picture Is Worth A Thousand Words. DAY 7: EXCEL CHAPTER 4 Tazin Afrin September 10,
Integrating Open Source Statistical Packages with ArcGIS Mark V. Janikas Liang-Huan (Leo) Chin.
R Brown-Bag Seminar 2: Hands-on Antony Karanja Ndungu Research Methods Group-ICRAF.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
PowerPoint Lesson 6 Working with Tables and Charts Microsoft Office 2010 Advanced Cable / Morrison 1.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Scientific Python Numpy Linear Algebra Matrices Scipy Signal Processing Optimization IPython Interactive Console Matplotlib Plotting Sympy Symbolic Math.
Python for: Data Science. Python  Python is an open source scripting language.  Developed by Guido Van Rossum in late 1980s  Named after Monty Python.
DATA MINING Pandas. Python Data Analysis Library A library for data analysis of (mostly) tabular data Gives capabilities similar to Excel and SQL but.
Matplotlib SANTHOSH Boggarapu.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning MATLAB Essentials.
Python & NetworkX Youn-Hee Han
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Python Scripting for Computational Science CPS 5401 Fall 2014 Shirley Moore, Instructor October 6,
Explorations in Computational Science: Mathematica Chemistry
Madhu Karra1, Jason Koglin1,2
Python for Data Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Python for data analysis Prakhar Amlathe Utah State University
IBM Predictive Analytics Virtual Users’ Group Meeting March 30, 2016
Building Machine Learning System with Python
CSC391/691 Intro to OpenCV Dr. Rongzhong Li Fall 2016
Introduction to R.
Software for scientific calculations
MatLab Programming By Kishan Kathiriya.
Data Mining Tools some examples.
External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with.
Data Tables, Indexes, pandas
Python for Quant Finance
Prepared by Kimberly Sayre and Jinbo Bi
Network Visualization
Python Visualization Tools: Pandas, Seaborn, ggplot
Python for Scientific Computing
Brief Intro to Python for Statistics
Data Analytics at CNU Dmitriy Shaltayev
Introduction to Image Processing in Python
Python for Data Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Scientific Python Introduction
Recitation #2 Tel Aviv University 2017/2018 Slava Novgorodov
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
商業智慧實務 Practices of Business Intelligence
Recitation #2 Tel Aviv University 2016/2017 Slava Novgorodov
Recitation #1 Tel Aviv University 2017/2018 Slava Novgorodov
Tour of NCL Website Modified by R. Grotjahn
Introduction to Matlab
Matplotlib and Pandas
Collecting, Analyzing, and Visualizing Data with Python Part I
The Student’s Guide to Apache Spark
Igor Stančin, Alan Jović to: {igor.stancin,
An Introduction to Data Science using Python
An Introduction to Data Science using Python
An Introduction to Data Science using Python
Presentation transcript:

Python for Data Analysis IS&T Research Computing Services Katia Oleinik Shuai Wei

Tutorial Content Overview of Python Libraries for Data Scientists Reading Data; Selecting and Filtering the Data; Data manipulation, sorting, grouping, rearranging Tutorial Content Plotting the data Descriptive statistics Inferential statistics

Python Libraries for Data Science Many popular Python toolboxes/libraries: NumPy SciPy Pandas SciKit-Learn Visualization libraries matplotlib Seaborn and many more … All these libraries are installed on the SCC

Python Libraries for Data Science NumPy: introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance many other python libraries are built on NumPy Link: http://www.numpy.org/

Python Libraries for Data Science SciPy: collection of algorithms for linear algebra, differential equations, numerical integration, optimization, statistics and more part of SciPy Stack built on NumPy Link: https://www.scipy.org/scipylib/

Python Libraries for Data Science Pandas: adds data structures and tools designed to work with table-like data (similar to Series and Data Frames in R) provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc. allows handling missing data Link: http://pandas.pydata.org/

Python Libraries for Data Science SciKit-Learn: provides machine learning algorithms: classification, regression, clustering, model validation etc. built on NumPy, SciPy and matplotlib Link: http://scikit-learn.org/

Python Libraries for Data Science matplotlib: python 2D plotting library which produces publication quality figures in a variety of hardcopy formats  a set of functionalities similar to those of MATLAB line plots, scatter plots, barcharts, histograms, pie charts etc. relatively low-level; some effort needed to create advanced visualization Link: https://matplotlib.org/

Python Libraries for Data Science Seaborn: based on matplotlib  provides high level interface for drawing attractive statistical graphics Similar (in style) to the popular ggplot2 library in R Link: https://seaborn.pydata.org/

Download tutorial notebook http://www.bu.edu/tech/support/research/training-consulting/live- tutorials/

Thank you for attending the tutorial! Please fill the evaluation form: http://scv.bu.edu/survey/tutorial_ evaluation.html Questions: help@scc.bu.edu