PYTHON Prof. Muhammad Saeed.

Slides:

Advertisements

Similar presentations

Introduction to Programming Lecture 39. Copy Constructor.

Advertisements

Excel Review EXCEL n Worksheet n Workbook n Cell n Cell Reference n Active Cell n Name Box n Toolbars n Entering Text n Autosum n Copying Cells n Range.

Pandas: Python Programming for Spreadsheets Pamela Wu Sept. 17 th 2015.

A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.

7. Lists 1 Let’s Learn Saenthong School, January – February 2016 Teacher: Aj. Andrew Davison, CoE, PSU Hat Yai Campus

Social Computing and Big Data Analytics 社群運算與大數據分析 SCBDA06 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (B309) Finance Big Data Analytics with.

CS2910 Week 6, Lab Today Dictionaries in Python SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors: Dr. Yoder 1.

C Programming Lecture 15 Two Dimensional Arrays. Two-Dimensional Arrays b The C language allows arrays of any type, including arrays of arrays. With two.

PH2150 Scientific Computing Skills

Lesson 12: Data Analysis Class Participation: Class Chat:

Python for Data Analysis

Python for data analysis Prakhar Amlathe Utah State University

Two-Dimensional Arrays

Numpy (Numerical Python)

Two-dimensional arrays

Two Dimensional Array Mr. Jacobs.

Lists in Python Parallel lists.

External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with.

SQL – Application Persistence Design Patterns

Data Tables, Indexes, pandas

Lesson 12: Data Analysis Topic: Data Analysis, Readings on website.

Python for Quant Finance

Network Visualization

Python Visualization Tools: Pandas, Seaborn, ggplot

PYTHON Prof. Muhammad Saeed.

Introduction to pandas

regex (Regular Expressions) Examples Logic in Python (and pandas)

PYTHON Prof. Muhammad Saeed.

Numerical Computing in Python

Data Science with Python Pandas

PH2150 Scientific Computing Skills

Lecture 6 2d Arrays Richard Gesick.

Numpy (Numerical Python)

Multidimensional Arrays

Numpy (Numerical Python)

Chapter 11 Data Structures.

Datatypes in Python (Part II)

Numpy and Pandas Dr Andy Evans

Python-NumPy Tutorial

PYTHON Prof. Muhammad Saeed.

PYTHON Prof. Muhammad Saeed.

Lecture 6: Data Quality and Pandas

Python for Data Analysis

PYTHON Graphs Prof. Muhammad Saeed.

PYTHON Prof. Muhammad Saeed.

Multidimensional array

Pandas John R. Woodward.

Slides based on Z. Ives’ at University of Pennsylvania and

Pandas Based on: Series: 1D labelled single-type arrays DataFrames: 2D labelled multi-type arrays Generally in 2D arrays, one can have the first dimension.

Scipy 'Ecosystem' containing a variety of scientific packages including iPython, numpy, matplotlib, and pandas. numpy is both a system for constructing.

Lecture 3b- Data Wrangling I

INC 161 , CPE 100 Computer Programming

Lecture 4- Data Wrangling

Dr. Sampath Jayarathna Cal Poly Pomona

Dr. Sampath Jayarathna Cal Poly Pomona

Lesson 12: Data Analysis Class Chat: Attendance: Participation

Dr. Sampath Jayarathna Old Dominion University

Dr. Sampath Jayarathna Old Dominion University

Data Wrangling with pandas

Collecting, Analyzing, and Visualizing Data with Python Part I

PYTHON PANDAS FUNCTION APPLICATIONS

regex (Regular Expressions) Examples Logic in Python (and pandas)

CSE 231 Lab 15.

INTRODUCING PYTHON PANDAS:-SERIES

By : Mrs Sangeeta M Chauhan , Gwalior

Presentation transcript:

PYTHON Prof. Muhammad Saeed

Python Dept. Of Comp. Sc. & IT, FUUAST Pandas Data Analysis Library Lecture 1 Prof. Muhammad Saeed Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame & Series Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Initialization ( Constructors ): import numpy as np import pandas as pd 1. ar=np.random.rand(30).reshape(5,6) ar=np.int32(ar*20) cols=['c','d','e','f','g','h'] ; rows=['1st','2nd','3rd','4th','5th'] df=pd.DataFrame(data=ar, index=rows, columns=cols) print(df) print(df.index) print(df.columns) print(df.values) print(df.shape) df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6],[7, 8, 9]])) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Initialization ( Constructors ): import numpy as np import pandas as pd 2. Dictionary To DataFrame d = {‘col1' : [1, 2, 3, 4], ‘col2' : [5, 6, 7, 8], ’col3’ : [9,10,11,12]} df1=pd.DataFrame(d) df=pd.DataFrame(d, index=['a', 'b', 'c', 'd']) print(df1, ’\n\n’, df, ’\n’) data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}] df=pd.DataFrame(data2) print(df) df2=pd.DataFrame(data2, index=['first', 'second']) print(df2) 3. Tuples To DataFrame df3=pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])]) print(df3) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Initialization ( Constructors ): import numpy as np import pandas as pd data1 = np.array([[' ','Col1','Col2','Col3','Col4','Col5'], ['Row1',1,2,3,4,5], ['Row2',11,12,13,14,15], ['Row3',21,22,23,24,25]]) df=pd.DataFrame(data=data1[1:,1:], index=data1[1:,0], columns=data1[0,1:]) print(df) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Import & Export of Excel Data: import numpy as np import pandas as pd from pandas import ExcelWriter df1=pd.read_excel('workData.xls','Sheet1') df2=pd.read_excel('workData.xls','Sheet2') print('\nSheet1:\n',df1,'\nSheet2:\n',df2) print(df1['A']) df.to_excel(‘data1.xlsx', sheet_name='Sheet1') with ExcelWriter(‘data2.xlsx') as writer: df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Column & Row Handling : df=pd.DataFrame.from_items([(‘one', [1, 2, 3]), (‘two', [4, 5, 6]), ( 'three‘,[7,8,9])]) print(df['one']) df['three'] = df['one'] * df['two'] print(df) df[‘four'] = df['one'] > 2 del df['two'] three = df.pop('three') df[‘New'] = ‘good‘ #[-1,-2,-3]; 2 ; ‘Panama’ Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Column & Row Handling : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, np.nan,7]), ( 'three‘,[7,8,9,10]),(‘four’,[11,12,13,14])]) df.loc[2,’three’] df[‘cut'] = df['one'][:2] print(df.loc[0]) print(df.loc[1:3]) df.loc[:,[‘one',‘three']]; #print(df.loc['2nd','e']) print(df.iloc[1,0:2]) df.iloc[[1,2,3],[0,1]] df.iat[0,1] = 0 df.at[1,‘two'] = 0 df.iat[1,1] df.loc[:,’two’]; df.loc[0,’three’] df1=df.T #Transpose df.dropna(how='any') # rows dropped containing nan value Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Column & Row Handling : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, 6,7]), ( 'three‘,[7,8,9,10])]) df1=df.drop(0,axis=0) df1=df.drop([0,1],axis=0) df1=df.drop(‘one’,axis=1) df1=df.drop([‘one’,’two’],axis=1) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, 6,7]), ( 'three‘,[7,8,9,10])]) General: print(df.count()) print(df.head(2)) print(df.tail(3)) print(df.describe()) Maths. Functions and Operators np.exp(df) np.sqrt(df) …………… df.apply(np.exp) df.apply(np.sqrt) df3=df1+df2 ; df3=df1*df2 df.loc[:,‘four’]=df.loc[:,‘one’]/df.loc[:,‘two’] Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, 6,7]), ( 'three‘,[7,8,9,10]),(‘four’,[11,12,13,14])]) df.loc[:,‘four’]=df.loc[:,‘one’]/df.loc[:,‘two’] df.idxmin(axis=0) df.idxmax(axis=1) df[‘two'].idxmin() df1 = pd.DataFrame(np.random.randint(0,30, size=(5,10))) pieces = [df1.iloc[0:,:3], df1.iloc[0:,3:7], df1.iloc[0:,7:]] pd.concat(pieces) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : s1 = pd.Series(np.random.randn(1000)) s2 = pd.Series(np.random.randn(1000)) df = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e']) Stats: print(df.median()) print(df[df > df.median()]) df.mean() df.mean(1) df.apply(np.cumsum) df.apply(lambda x: x.max() - x.min()) s1.cov(s2) df.cov() s1.corr() df.corr() Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : Stats: Discretization and quantiling ar = np.random.randn(30) factor = pd.cut(ar, 4) pd.value_counts(factor) factor = pd.qcut(ar, [0, .25, .5, .75, 1]) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, python objects, etc.). import numpy as np import pandas as pd Construction: s = pd.Series([1,2,5,3,7,10], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ’f’]) s = pd.Series([1,2,5,3,7,10], index=list(‘abcdef’)) s = pd.Series(np.random.randn(5)) d = {'a' : 2., 'b' : 4., 'c' : 6.} s = pd.Series(d) s = pd.Series(d, index=['b', 'c', 'd', 'a']) s = pd.Series(5., index=['a', 'b', 'c', 'd', 'e']) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series s = pd.Series([1,2,4,8,16,32], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ’f’]) Selection and Slicing: s[0] s['a'] s[:3] s[[4, 3, 1]] s[s > s.median()] Operations: s + s s[1:] + s[:-1] s * s s * 5 s['e'] = 0 np.exp(s) ‘b' in s ; ‘g’ in s s1 = pd.Series([1, 1, 3, 3, 3, 5, 5, 7, 7, 7]) s1.mode() Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series Mapping: s = pd.series(np.random.randint(0, 7, size=50)) s.value_counts() s = pd.Series(['six', 'seven', 'six', 'seven', 'six'], index=['a', 'b', 'c', 'd', 'e']) t = pd.Series({'six' : 6., 'seven' : 7.}) s.map(t) x = pd.Series([1,1.5,2.0,2.5], index=['one', 'two', 'three','four']) y = pd.Series(['foo', 'bar', 'baz'], index=[1,1.5,3]) z = {1: 'A', 2: 'B', 3: 'C'} x.map(y) x.map(z) Naming: s = pd.Series(np.random.randn(5), name=‘Series1') Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series DateTime: s = pd.Series(pd.date_range('20130101 09:10:12', periods=4)) s.dt.hour s.dt.year s.dt.strftime('%Y/%m/%d') s = pd.Series(pd.period_range('20130101', periods=4, freq='D')) s = pd.Series(pd.timedelta_range('1 day 00:00:05', periods=4, freq='s')) s.dt.days s.dt.seconds s.dt.components Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Pandas END Python Dept. Of Comp. Sc. & IT, FUUAST