By : Mrs Sangeeta M Chauhan , Gwalior

By : Mrs Sangeeta M Chauhan , Gwalior
Python Pandas DataFrame By : Mrs Sangeeta M Chauhan , Gwalior by Sangeeta M Chauhan, Gwalior

What is Data Frame? A Data frame is a 2D (two-dimensional) data structure, i.e., data is arranged in tabular form i.e. In the form of rows and columns. Or we can say that, Pandas DataFrame is similar to excel sheet Let’s understand it through an example

Parameter & Description
Create DataFrame pandas DataFrame can be created using the following constructor pandas.DataFrame ( data[, index, columns, dtype, copy]) The parameters of the constructor are as follows Sr.No Parameter & Description 1 Data data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame. 2 Index For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed. 3 Columns For column labels, the optional default syntax is - np.arrange(n). This is only true if no index is passed. 4 Dtype Data type of each column. 5 Copy This command is used for copying of data, if the default is False.

>>> import pandas as pd >>> df=pd.DataFrame()
A pandas DataFrame can be created using various inputs like 1. Lists 2. dictionary 3. Series 4. Numpy ndarrays 5. Another DataFrame >>> import pandas as pd >>> df=pd.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] Creating an Empty DataFrame

Create a DataFrame from Lists
Example 1 (Simple List) >>> MyList=[10,20,30,40] >>> MyFrame=pd.DataFrame(MyList) >>> MyFrame Create a DataFrame from Lists Example 1 (Simple List) >>> Friends = [['Shraddha','Doctor'],['Shanti','Teacher'],['Monica','Engineer']] >>> MyFrame=pd.DataFrame(Friends,columns=['Name','Occupation']) >>> MyFrame Name Occupation 0 Shraddha Doctor 1 Shanti Teacher 2 Monica Engineer

Creation of a DataFrame from Dictionary of ndarrays / Lists
All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays. If no index is passed, then by default, index will be range(n), where n is the array length.

Example 1 (without index)
>>> data = {'Name':['Shraddha', 'Shanti','Monica','Yogita'], \ ‘Age’:[28,34,29,39]} >>> df = pd.DataFrame(data) >>> df Name Age 0 Shraddha 28 1 Shanti 2 Monica 3 Yogita

Example 1 (With Index) Name Age Friend1 Shraddha 28 Friend2 Shanti 34
>>> data = {'Name':['Shraddha', 'Shanti', 'Monica', 'Yogita'],\ 'Age':[28,34,29,39]} >>> df = pd.DataFrame(data, \ index=['Friend1','Friend2','Relative1','Relative2']) >>> df Name Age Friend Shraddha 28 Friend Shanti Relative1 Monica Relative2 Yogita

Create a DataFrame from List of Dictionaries
Here we are passing list of dictionary to create a DataFrame. The dictionary keys are by default taken as column names. Example 1: >>> Mydict= [{'Won': 15, 'Loose': 2},{'Won': 5, 'Loose': 10} , {'Won': 8, 'Loose': 9},{'Won':4}] >>> df = pd.DataFrame(Mydict) >>> df Loose Won 3 NaN Notice that Missing Value is stored as NaN (Not a Number)

Example 2: Changing Index
>>> Mydict=[{'Won': 15, 'Loose': 2},{'Won': 5, 'Loose': 10},{'Won': 8, 'Loose': 9}] >>> df = pd.DataFrame(Mydict, index= ['India', 'Pakistan', 'Australia' ]) >>> df Loose Won India Pakistan Autralia

Example 3 We can also create a DataFrame by specifying list of dictionaries, row indices, and column indices >>> L_dict = [{'Maths': 78, 'Chemistry': 78,'Physics':87},{'Maths': 67, 'Chemistry': 70},{'Physics':77,'Maths':87}] Physics Chemistry Maths Student Student NaN Student NaN >>> df1 = pd.DataFrame(L_dict, index=['Student1', 'Student2','Student3'], columns=['Physics', 'Chemistry','Maths'])

df2 is created with only 2 columns
>>> df2 = pd.DataFrame (L_dict, index=['Student1', 'Student2','Student3'], columns=['Chemistry','Maths']) Chemistry Maths Student Student2 NaN 67 Student3 NaN 87 df2 is created with only 2 columns

Creating df3 by specifying 3 column name (New Column English)
>>> df3 = pd.DataFrame(L_dict, index=['Student1', 'Student2','Student3'], columns=['English','Chemistry','Maths']) Creating df3 by specifying 3 column name (New Column English) >>> df3 English Chemistry Maths Student1 NaN Student2 NaN NaN 67 Student3 NaN NaN 87

Addition of New Column & Row
Column Addition >>> L_dict = [{'Maths': 78, 'Chemistry': 78,'Physics':87},{'Maths': 67, 'Chemistry': 70},{'Physics':77,'Maths':87,'Chemistry':90}] >>> df3 = pd.DataFrame(L_dict, index=['Student1', 'Student2', 'Student3'], columns=['English','Chemistry','Maths']) >>> df3['Physics']=[45,56,65] English Chemistry Maths Physics Student NaN Student NaN Student NaN New Column Physics added

We can Update column Data also by using same method
>>> df3['English']=[78,98,89] English Chemistry Maths Physics Student Student Student Values of column English (NaN ) is replaced with new values

We can also add new column using Data ,stored in existing Frame
df3['Total']=df3.English+df3.Chemistry+df3.Maths+df3.Physics Look ,a new Column Total has been added with total of marks in other subjects English Chemistry Maths Physics Total Student Student Student

ASSIGNING AND COPYING OF DATAFRAME

Changes reflected in both dataframes
Changes reflected in 2nd dataframeonly

SELECTION AND INDEXING
Methods covered in this section are: Selecting data by row numbers (.iloc) Selecting data by label or by a conditional statment (.loc) Selecting data at particular row and column(.at) loc gets rows (or columns) with particular labels from the index. iloc gets rows (or columns) at particular positions in the index (so it only takes integers).

ACCESSING DIFFERENT ROWS/ COLUMNS

Here end index is excluded
Use of iloc Here end index is excluded

Use of at with dataFrame
Access a single value for a row/column label pair. <df>.at[rowname,colname] <df>.loc[rowname].at[colname] XI.at['stu2','Name'] XI.at['stu2','Physics'] XI.loc['stu2'].at['Chemistry']

Row Addition To add a row , by specifying row index
>>> df3.loc['Student4']=[45,67,45] >>> df3 English Chemistry Maths Student Student Student Student

Use of iloc()

Deletion of an Existing Column/Row from Data Frame

Row Deletion >>> df3.drop('Student3') English Chemistry Maths Physics Student Student

Descriptive Statistics with Pandas
Pandas also offer many useful Statistical and Aggregate Functions. Out of which we are going to discuss following functions

DataFrame.min (axis=None, skipna=None, numeric_only=None,)
Parameters : axis : Align object with threshold along the given axis. skipna : Exclude NA/null values when computing the result numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Use of min()

Calculating Mode (बहुलक)

Compare both Data & Result

Calculating median

Count () will count element for each column (by default)

Sum() by column

APPLYING FUNCTIONS ON SUBSET OF DATA FRAME
<DF>[[<COL NAME1,COLNAME2….>].<fun_nm> Example : XI[‘Chemitry’].min()

APPLYING FUNCTIONS ON ROW OF DATA FRAME
<DF>.loc[<row_index>,….].<function_name>

ADVANCED OPERATION ON DATAFRAME
1. PIVOTTING :In Pandas, the pivot table function takes simple data frame as input, and performs grouped operations that provides a multidimensional summary of the data.

pivot() function pandas.pivot(index, columns, values)
Function produces pivot table based on 3 columns of the DataFrame. Uses unique values from index / columns and fills with values. Parameters: index : Labels to use to make new frame’s index columns : Labels to use to make new frame’s columns values : Values to use for populating new frame’s values

Csv file : resultAnalyiTr
S.no Class Teacher Name Subject ResultPer 1 XII RK Saxena Physics 98 2 Neelam Sharma Chemistry 100 3 PP Singh Maths 99.5 4 Ravita CS

What if we have duplicate values in column and index?????
Yes we have solution….. We can use pivot_table() instead of pivot() pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)

Lets consider another table (with duplicate values)

Lets consider modified csv file

pivotTableEx2.py

SORTING DataFrame.sort_values(columns=None, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last’) Parameters: columns : on which data is sorted ascending : default True(Ascending) Specify list for multiple sort orders axis : {0 or ‘index’, 1 or ‘columns’}, default 0, Sort index/rows versus columns inplace : boolean, default False, Sort the DataFrame without creating a new instance kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, optional This option is only applied when sorting on a single column or label. na_position : {‘first’, ‘last’} (optional, default=’last’) ‘first’ puts NaNs at the beginning ‘last’ puts NaNs at the end

Use of df.sort_values()
DataFrame df Ascending Order Descending

Descending order with Nan values at first position

Sorting on Multiple Columns
print("Sorting on Multiple(2) Columns") print(df.sort_values(by=['Class','Teacher Name'])) print("Sorting on Multiple(3) Columns") print(df.sort_values(['Class','Teacher Name','Year']))

Aggregate Functions Function Description count
Number of non-null observations sum Sum of values mean Mean of values mad Mean absolute deviation median Arithmetic median of values min Minimum max Maximum mode Mode std Unbiased standard deviation

<Df>.<agg_fun_name>()

FUNCTION APPLICATION On the whole data frame – pipe()
Function (UDF or Library) can be applied on a dataframe in multipleways On the whole data frame – pipe() Row/column wise - apply() On individual element – applymap()

Output of previous pipe
Data Input Output of previous pipe Input to next pipe

Pipe(func_name,*args)
d1={'Sal':[50000,60000,55000],'bonus':[3000,4000,5000]} df1=pd.DataFrame(d1) print(df1.pipe(np.add,2).pipe(np.power,2).pipe(np.divide,3)) print(df1.pipe(df1.add,2).pipe(df1.divide,3)) for : divide(power( add(df,2),2),3)

Pandas.DataFrame.info() function is used to get a concise summary of the dataframe. Pandas.DataFrame.describe() Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Grouping on Dataframe’s Column <DF>.groupby(by=None,axis=0)
Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object Applying a function Combining the results In many situations, we split the data into sets and we apply some functionality on each subset. In the apply functionality, we can perform the following operations − Aggregation − computing a summary statistic Transformation − perform some group-specific operation Filtration − discarding the data with some condition

import pandas as pd import numpy as np df=pd.read_csv("resultAnalysis.csv") print(df) print("Grouping on Class") grp=df.groupby(by='Teacher Name') print(grp)

print(grp.groups)

for name,group in grp: print (name) print (group)

print(gp.get_group('Namrata'))
SELECT A GROUP Using the get_group() method, we can select a single group. print(gp.get_group('Namrata'))

Aggregation print (grp['ResultPer'].agg(np.mean))

print(gp['ResultPer'].agg([np.sum, np.mean, np.std]))
Applying Multiple Aggregation Functions at Once print(gp['ResultPer'].agg([np.sum, np.mean, np.std]))

To see the size of each group is by applying the size() function
print(gp.agg(np.size))

Grouping on multiple columns and aggregation
gp=df.groupby(by=['Teacher Name', 'Class‘, ’Year’] ). agg(np.mean)

transform()) It transforms the aggregate data by repeating the summary result for each row of the group and makes the result have the same shape as original data

Output before/after transform

Reindexing and Altering Labels

Changing column labels
Rename(): It simply renames the index and/or column labels in a dataframe. Changing column labels

Changing Column label and row index

<DF>.reindex(index=None,Columns=None, fill_value=nan
Reindex(): It helps to specify new order of by reordering records to be displayed existing indexes and column labels. <DF>.reindex(index=None,Columns=None, fill_value=nan

Observe that it contains values of common column ‘Physics’
Reindex_like() : Used to create indexes/columns labels based on the other dataframe object Observe that it contains values of common column ‘Physics’

knowledge is of no value unless you put it into practice
So, Keep Practicing THANKS

By : Mrs Sangeeta M Chauhan , Gwalior

Similar presentations

Presentation on theme: "By : Mrs Sangeeta M Chauhan , Gwalior"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

By : Mrs Sangeeta M Chauhan , Gwalior

Similar presentations

Presentation on theme: "By : Mrs Sangeeta M Chauhan , Gwalior"— Presentation transcript:

Similar presentations

About project

Feedback