PYTHON Prof. Muhammad Saeed.

Slides:



Advertisements
Similar presentations
Introduction to Programming Lecture 39. Copy Constructor.
Advertisements

Excel Review EXCEL n Worksheet n Workbook n Cell n Cell Reference n Active Cell n Name Box n Toolbars n Entering Text n Autosum n Copying Cells n Range.
Pandas: Python Programming for Spreadsheets Pamela Wu Sept. 17 th 2015.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
7. Lists 1 Let’s Learn Saenthong School, January – February 2016 Teacher: Aj. Andrew Davison, CoE, PSU Hat Yai Campus
Social Computing and Big Data Analytics 社群運算與大數據分析 SCBDA06 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (B309) Finance Big Data Analytics with.
CS2910 Week 6, Lab Today Dictionaries in Python SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors: Dr. Yoder 1.
C Programming Lecture 15 Two Dimensional Arrays. Two-Dimensional Arrays b The C language allows arrays of any type, including arrays of arrays. With two.
PH2150 Scientific Computing Skills
Lesson 12: Data Analysis Class Participation: Class Chat:
Python for Data Analysis
Python for data analysis Prakhar Amlathe Utah State University
Two-Dimensional Arrays
Numpy (Numerical Python)
Two-dimensional arrays
Two Dimensional Array Mr. Jacobs.
Lists in Python Parallel lists.
External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with.
SQL – Application Persistence Design Patterns
Data Tables, Indexes, pandas
Lesson 12: Data Analysis Topic: Data Analysis, Readings on website.
Python for Quant Finance
Network Visualization
Python Visualization Tools: Pandas, Seaborn, ggplot
PYTHON Prof. Muhammad Saeed.
Introduction to pandas
regex (Regular Expressions) Examples Logic in Python (and pandas)
PYTHON Prof. Muhammad Saeed.
Numerical Computing in Python
Data Science with Python Pandas
PH2150 Scientific Computing Skills
Lecture 6 2d Arrays Richard Gesick.
Numpy (Numerical Python)
Multidimensional Arrays
Numpy (Numerical Python)
Chapter 11 Data Structures.
Datatypes in Python (Part II)
1.
Numpy and Pandas Dr Andy Evans
Python-NumPy Tutorial
PYTHON Prof. Muhammad Saeed.
PYTHON Prof. Muhammad Saeed.
Lecture 6: Data Quality and Pandas
Python for Data Analysis
雲端計算.
PYTHON Graphs Prof. Muhammad Saeed.
PYTHON Prof. Muhammad Saeed.
Multidimensional array
Pandas John R. Woodward.
雲端計算.
Slides based on Z. Ives’ at University of Pennsylvania and
Pandas Based on: Series: 1D labelled single-type arrays DataFrames: 2D labelled multi-type arrays Generally in 2D arrays, one can have the first dimension.
Scipy 'Ecosystem' containing a variety of scientific packages including iPython, numpy, matplotlib, and pandas. numpy is both a system for constructing.
Lecture 3b- Data Wrangling I
INC 161 , CPE 100 Computer Programming
Lecture 4- Data Wrangling
Dr. Sampath Jayarathna Cal Poly Pomona
雲端計算.
Dr. Sampath Jayarathna Cal Poly Pomona
Lesson 12: Data Analysis Class Chat: Attendance: Participation
Dr. Sampath Jayarathna Old Dominion University
Dr. Sampath Jayarathna Old Dominion University
Data Wrangling with pandas
Collecting, Analyzing, and Visualizing Data with Python Part I
PYTHON PANDAS FUNCTION APPLICATIONS
regex (Regular Expressions) Examples Logic in Python (and pandas)
CSE 231 Lab 15.
DATAFRAME.
INTRODUCING PYTHON PANDAS:-SERIES
By : Mrs Sangeeta M Chauhan , Gwalior
Presentation transcript:

PYTHON Prof. Muhammad Saeed

Python Dept. Of Comp. Sc. & IT, FUUAST Pandas Data Analysis Library Lecture 1 Prof. Muhammad Saeed Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame & Series Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Initialization ( Constructors ): import numpy as np import pandas as pd 1. ar=np.random.rand(30).reshape(5,6) ar=np.int32(ar*20) cols=['c','d','e','f','g','h'] ; rows=['1st','2nd','3rd','4th','5th'] df=pd.DataFrame(data=ar, index=rows, columns=cols) print(df) print(df.index) print(df.columns) print(df.values) print(df.shape) df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6],[7, 8, 9]])) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Initialization ( Constructors ): import numpy as np import pandas as pd 2. Dictionary To DataFrame d = {‘col1' : [1, 2, 3, 4], ‘col2' : [5, 6, 7, 8], ’col3’ : [9,10,11,12]} df1=pd.DataFrame(d) df=pd.DataFrame(d, index=['a', 'b', 'c', 'd']) print(df1, ’\n\n’, df, ’\n’) data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}] df=pd.DataFrame(data2) print(df) df2=pd.DataFrame(data2, index=['first', 'second']) print(df2) 3. Tuples To DataFrame df3=pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])]) print(df3) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Initialization ( Constructors ): import numpy as np import pandas as pd data1 = np.array([[' ','Col1','Col2','Col3','Col4','Col5'], ['Row1',1,2,3,4,5], ['Row2',11,12,13,14,15], ['Row3',21,22,23,24,25]]) df=pd.DataFrame(data=data1[1:,1:], index=data1[1:,0], columns=data1[0,1:]) print(df) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Import & Export of Excel Data: import numpy as np import pandas as pd from pandas import ExcelWriter df1=pd.read_excel('workData.xls','Sheet1') df2=pd.read_excel('workData.xls','Sheet2') print('\nSheet1:\n',df1,'\nSheet2:\n',df2) print(df1['A']) df.to_excel(‘data1.xlsx', sheet_name='Sheet1') with ExcelWriter(‘data2.xlsx') as writer: df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Column & Row Handling : df=pd.DataFrame.from_items([(‘one', [1, 2, 3]), (‘two', [4, 5, 6]), ( 'three‘,[7,8,9])]) print(df['one']) df['three'] = df['one'] * df['two'] print(df) df[‘four'] = df['one'] > 2 del df['two'] three = df.pop('three') df[‘New'] = ‘good‘ #[-1,-2,-3]; 2 ; ‘Panama’ Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Column & Row Handling : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, np.nan,7]), ( 'three‘,[7,8,9,10]),(‘four’,[11,12,13,14])]) df.loc[2,’three’] df[‘cut'] = df['one'][:2] print(df.loc[0]) print(df.loc[1:3]) df.loc[:,[‘one',‘three']]; #print(df.loc['2nd','e']) print(df.iloc[1,0:2]) df.iloc[[1,2,3],[0,1]] df.iat[0,1] = 0 df.at[1,‘two'] = 0 df.iat[1,1] df.loc[:,’two’]; df.loc[0,’three’] df1=df.T #Transpose df.dropna(how='any') # rows dropped containing nan value Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Column & Row Handling : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, 6,7]), ( 'three‘,[7,8,9,10])]) df1=df.drop(0,axis=0) df1=df.drop([0,1],axis=0) df1=df.drop(‘one’,axis=1) df1=df.drop([‘one’,’two’],axis=1) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, 6,7]), ( 'three‘,[7,8,9,10])]) General: print(df.count()) print(df.head(2)) print(df.tail(3)) print(df.describe()) Maths. Functions and Operators np.exp(df) np.sqrt(df) …………… df.apply(np.exp) df.apply(np.sqrt) df3=df1+df2 ; df3=df1*df2 df.loc[:,‘four’]=df.loc[:,‘one’]/df.loc[:,‘two’] Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : df=pd.DataFrame.from_items([(‘one', [1, 2, 3,4]), (‘two', [4, 5, 6,7]), ( 'three‘,[7,8,9,10]),(‘four’,[11,12,13,14])]) df.loc[:,‘four’]=df.loc[:,‘one’]/df.loc[:,‘two’] df.idxmin(axis=0) df.idxmax(axis=1) df[‘two'].idxmin() df1 = pd.DataFrame(np.random.randint(0,30, size=(5,10))) pieces = [df1.iloc[0:,:3], df1.iloc[0:,3:7], df1.iloc[0:,7:]] pd.concat(pieces) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : s1 = pd.Series(np.random.randn(1000)) s2 = pd.Series(np.random.randn(1000)) df = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e']) Stats: print(df.median()) print(df[df > df.median()]) df.mean() df.mean(1) df.apply(np.cumsum) df.apply(lambda x: x.max() - x.min()) s1.cov(s2) df.cov() s1.corr() df.corr() Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST DataFrame: Functions : Stats: Discretization and quantiling ar = np.random.randn(30) factor = pd.cut(ar, 4) pd.value_counts(factor) factor = pd.qcut(ar, [0, .25, .5, .75, 1]) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, python objects, etc.). import numpy as np import pandas as pd Construction: s = pd.Series([1,2,5,3,7,10], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ’f’]) s = pd.Series([1,2,5,3,7,10], index=list(‘abcdef’)) s = pd.Series(np.random.randn(5)) d = {'a' : 2., 'b' : 4., 'c' : 6.} s = pd.Series(d) s = pd.Series(d, index=['b', 'c', 'd', 'a']) s = pd.Series(5., index=['a', 'b', 'c', 'd', 'e']) Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series s = pd.Series([1,2,4,8,16,32], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ’f’]) Selection and Slicing: s[0] s['a'] s[:3] s[[4, 3, 1]] s[s > s.median()] Operations: s + s s[1:] + s[:-1] s * s s * 5 s['e'] = 0 np.exp(s) ‘b' in s ; ‘g’ in s s1 = pd.Series([1, 1, 3, 3, 3, 5, 5, 7, 7, 7]) s1.mode() Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series Mapping: s = pd.series(np.random.randint(0, 7, size=50)) s.value_counts() s = pd.Series(['six', 'seven', 'six', 'seven', 'six'], index=['a', 'b', 'c', 'd', 'e']) t = pd.Series({'six' : 6., 'seven' : 7.}) s.map(t) x = pd.Series([1,1.5,2.0,2.5], index=['one', 'two', 'three','four']) y = pd.Series(['foo', 'bar', 'baz'], index=[1,1.5,3]) z = {1: 'A', 2: 'B', 3: 'C'} x.map(y) x.map(z) Naming: s = pd.Series(np.random.randn(5), name=‘Series1') Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Series DateTime: s = pd.Series(pd.date_range('20130101 09:10:12', periods=4)) s.dt.hour s.dt.year s.dt.strftime('%Y/%m/%d') s = pd.Series(pd.period_range('20130101', periods=4, freq='D')) s = pd.Series(pd.timedelta_range('1 day 00:00:05', periods=4, freq='s')) s.dt.days s.dt.seconds s.dt.components Python Dept. Of Comp. Sc. & IT, FUUAST

Python Dept. Of Comp. Sc. & IT, FUUAST Pandas END Python Dept. Of Comp. Sc. & IT, FUUAST