PYTHON PANDAS FUNCTION APPLICATIONS

Slides:

Advertisements

Similar presentations

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.

Advertisements

Database Systems More SQL Database Design -- More SQL1.

1 CSCE 1030 Computer Science 1 Arrays Chapter 7 in Small Java.

Advance Model Builder Features. Advance Features Using Lists (also Batching) Iteration Feedback Model Only Tools Inline Variable Substitution Preconditions.

Python Lab Pandas Library - II Proteomics Informatics, Spring 2014 Week 8 18 th Mar, 2014

Lists in Python.

Pandas: Python Programming for Spreadsheets Pamela Wu Sept. 17 th 2015.

 2005 Pearson Education, Inc. All rights reserved. 1 Arrays.

Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.

 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.

25 Copyright © 2009, Oracle. All rights reserved. Showing Results with Pivot Tables.

More SQL: Complex Queries, Triggers, Views, and Schema Modification

REVIEW Linear Combinations Given vectors and given scalars

Chapter 11 - JavaScript: Arrays

PH2150 Scientific Computing Skills

Python for Data Analysis

Python for data analysis Prakhar Amlathe Utah State University

More SQL: Complex Queries,

Applied Business Forecasting and Regression Analysis

Manipulating MATLAB Matrices Chapter 4

Topics Dictionaries Sets Serializing Objects. Topics Dictionaries Sets Serializing Objects.

Numpy (Numerical Python)

Artificial Neural Networks

Computer Graphics CC416 Week 15 3D Graphics.

Two-Dimensional Arrays Lesson xx

Department of Computer Science,

Matrix 2015/11/18 Hongfei Yan zip(*a) is matrix transposition

Matrix 2016/11/30 Hongfei Yan zip(*a) is matrix transposition

External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with.

Data Tables, Indexes, pandas

Python for Quant Finance

Lesson 1: Introduction to Trifacta Wrangler

Lesson 1: Introduction to Trifacta Wrangler

Python Visualization Tools: Pandas, Seaborn, ggplot

Fitting Curve Models to Edges

regex (Regular Expressions) Examples Logic in Python (and pandas)

PYTHON Prof. Muhammad Saeed.

JavaScript Charting Library

Data warehouse Design Using Oracle

REPORT QUERY BUILDER featuring: Report Query Viewer.

Topics Introduction to File Input and Output

Chapter 9: Records (structs)

Chapter 9: Records (structs)

Chapter 9: Records (structs)

Transforming Data (Python®)

More SQL: Complex Queries, Triggers, Views, and Schema Modification

Numpy (Numerical Python)

MSIS 655 Advanced Business Applications Programming

Exploring Microsoft® Office 2016 Series Editor Mary Anne Poatsy

Topics Dictionaries Sets Serializing Objects. Topics Dictionaries Sets Serializing Objects.

Lecture 6: Data Quality and Pandas

Producing Descriptive Statistics

Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.

Pandas John R. Woodward.

Slides based on Z. Ives’ at University of Pennsylvania and

Loops and Arrays in JavaScript

Lets Play with arrays Singh Tripty

Arrays Arrays A few types Structures of related data items

Dr. Sampath Jayarathna Cal Poly Pomona

Linear Algebra review (optional)

Dr. Sampath Jayarathna Old Dominion University

Dr. Sampath Jayarathna Old Dominion University

Topics Introduction to File Input and Output

Chapter 9: Records (structs)

regex (Regular Expressions) Examples Logic in Python (and pandas)

NULL SPACES, COLUMN SPACES, AND LINEAR TRANSFORMATIONS

Microsoft Excel 2007 – Level 2

By : Mrs Sangeeta M Chauhan , Gwalior

Presentation transcript:

PYTHON PANDAS FUNCTION APPLICATIONS To apply your own or another library’s functions to Pandas objects, you should be aware of the three important methods. The methods have been discussed below. The appropriate method to use depends on whether your function expects to operate on an entire DataFrame, row- or column-wise, or element wise. Table wise Function Application: pipe() Row or Column Wise Function Application: apply() Element wise Function Application: applymap()

Table-wise Function Application Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. Thus, operation is performed on the whole DataFrame.

Example, add a value 2 to all the elements in the DataFrame.

Row or Column Wise Function Application Arbitrary functions can be applied along the axes of a DataFrame or Panel using the apply() method, which, like the descriptive statistics methods, takes an optional axis argument. By default, the operation performs column wise, taking each column as an array-like.

Example import pandas as pd import numpy as np PData1 = {'A' : [10,20,30], 'B' : [12000,400,9800], 'C': [100,200,120] } df = pd.DataFrame(PData1) print(df) print("----------------------------") print(df.apply(lambda x: x.max() - x.min())) print("---Mean---------------------") print (df.apply(np.mean))

Element Wise Function Application Not all functions can be vectorized (neither the NumPy arrays which return another array nor any value), the methods applymap() on DataFrame and analogously map() on Series accept any Python function taking a single value and returning a single value.

Example import pandas as pd import numpy as np PData1 = {'A' : [10,20,30], 'B' : [12000,400,9800], 'C': [100,200,120] } df = pd.DataFrame(PData1) print(df) # My custom function print("----After use of map ----") print(df['A'].map(lambda x:x*100)) print("---------Mean-------------") print (df.apply(np.mean))

Example – applymap import pandas as pd import numpy as np PData1 = {'A' : [10,20,30], 'B' : [12000,400,9800], 'C': [100,200,120] } df = pd.DataFrame(PData1) print(df) # My custom function print("----After use of map ----") print(df['A'].map(lambda x:x*100)) print("----After use of applymap ----") #print(df['A'].applymap(lambda x:x*100)) # Error print(df.applymap(lambda x:x*100)) print("---------Mean-------------") print (df.apply(np.mean))

GROUP BY Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object Applying a function Combining the results In many situations, we split the data into sets and we apply some functionality on each subset. In the apply functionality, we can perform the following operations − Aggregation − computing a summary statistic Transformation − perform some group-specific operation Filtration − discarding the data with some condition

Split Data into Groups Pandas object can be split into any of their objects. There are multiple ways to split an object like − obj.groupby('key') obj.groupby(['key1','key2']) obj.groupby(key,axis=1)

Example import pandas as pd PData = {'Name' : ['Arpit','Zayana','Kareem','Abhilash','Dhanya','Siju','Ankita','Divya','Nidhin','Hari'], 'Age' : [62,18,68,26,24,23,16,20,25,28], 'Department' : ['Surgery','ENT','Orthopedic','Surgery','ENT','Cardiology','ENT','Cardiology','Orhopedic','Surgery'], 'Fee':[300,250,450,300,350,800,100,500,700,450], 'Gender':['M','F','M','M','F','M','F','F','M','M'] } df = pd.DataFrame(PData) print(df) print("----groupby--------------") print(df.groupby ('Department'))

Example – View Groups import pandas as pd PData = {'Name' : ['Arpit','Zayana','Kareem','Abhilash','Dhanya','Siju','Ankita','Divya','Nidhin','Hari'], 'Age' : [62,18,68,26,24,23,16,20,25,28], 'Department' : ['Surgery','ENT','Orthopedic','Surgery','ENT','Cardiology','ENT','Cardiology','Orhopedic','Surgery'], 'Fee':[300,250,450,300,350,800,100,500,700,450], 'Gender':['M','F','M','M','F','M','F','F','M','M'] } df = pd.DataFrame(PData) print(df) print("----View Groups--------------") print(df.groupby ('Department').groups)

Example – Groupby with Multiple Columns import pandas as pd PData = {'Name' : ['Arpit','Zayana','Kareem','Abhilash','Dhanya','Siju','Ankita','Divya','Nidhin','Hari'], 'Age' : [62,18,68,26,24,23,16,20,25,28], 'Department' : ['Surgery','ENT','Orthopedic','Surgery','ENT','Cardiology','ENT','Cardiology','Orhopedic','Surgery'], 'Fee':[300,250,450,300,350,800,100,500,700,450], 'Gender':['M','F','M','M','F','M','F','F','M','M'] } df = pd.DataFrame(PData) print(df) print("----Groupby with Multiple Columns---------") print(df.groupby (['Department‘, ‘Gender’].groups)

Example – Iterate Through Groups import pandas as pd PData = {'Name' : ['Arpit','Zayana','Kareem','Abhilash','Dhanya','Siju','Ankita','Divya','Nidhin','Hari'], 'Age' : [62,18,68,26,24,23,16,20,25,28], 'Department' : ['Surgery','ENT','Orthopedic','Surgery','ENT','Cardiology','ENT','Cardiology','Orhopedic','Surgery'], 'Fee':[300,250,450,300,350,800,100,500,700,450], 'Gender':['M','F','M','M','F','M','F','F','M','M'] } df = pd.DataFrame(PData) print("---- Iterate through Groups------------") grouped = df.groupby ('Department') for name,group in grouped: print (name) print (group)

Example – Select a Single Group import pandas as pd PData = {'Name' : ['Arpit','Zayana','Kareem','Abhilash','Dhanya','Siju','Ankita','Divya','Nidhin','Hari'], 'Age' : [62,18,68,26,24,23,16,20,25,28], 'Department' : ['Surgery','ENT','Orthopedic','Surgery','ENT','Cardiology','ENT','Cardiology','Orhopedic','Surgery'], 'Fee':[300,250,450,300,350,800,100,500,700,450], 'Gender':['M','F','M','M','F','M','F','F','M','M'] } df = pd.DataFrame(PData) print("---- Select a Single Group------------") grouped = df.groupby ('Department') print (grouped.get_group('ENT'))

Aggregations An aggregated function returns a single aggregated value for each group. Once the group by object is created, several aggregation operations can be performed on the grouped data. An obvious one is aggregation via the aggregate or equivalent agg method. print("---- Aggregation (agg or aggregate ) ---") print("--Department wise sum of fee--") grouped = df.groupby ('Department') print (grouped['Fee'].agg(np.sum)) print("--Department wise Mean--") print (grouped['Fee'].aggregate(np.mean))

To find size of group grouped = df.groupby ('Department') print (grouped.aggregate(np.size)) OR print (grouped.agg(np.size)) Output

Applying Multiple Aggregation Functions grouped Series, you can also pass a list or dict of functions to do aggregation with, and generate DataFrame as output – print("-Multiple Aggregation Functions-") grouped = df.groupby ('Department') print (grouped['Fee'].agg([np.sum,np.max,np.min]))

Transform Transform is an operation used in conjunction with groupby (which is one of the most useful operations in pandas). We retain the same number of items as the original data set. That is the unique feature of using transform. print("-Transform-") df["Total Fee"] = df.groupby('Department')["Fee"].transform('sum') print(df)

Pandas Series.transform() function Call function (the passed function) on self producing a Series with transformed values and that has the same axis length as self.

Transform – Ex. ( Increase the fee of each by 1000) import pandas as pd PData = {'Name' : ['Arpit','Zayana','Kareem','Abhilash','Dhanya','Siju','Ankita','Divya','Nidhin','Hari'], 'Age' : [62,18,68,26,24,23,16,20,25,28], 'Department' : ['Surgery','ENT','Orthopedic','Surgery','ENT','Cardiology','ENT','Cardiology','Orhopedic','Surgery'], 'Fee':[300,250,450,300,350,800,100,500,700,450], 'Gender':['M','F','M','M','F','M','F','F','M','M'] } df = pd.DataFrame(PData) print(df) print("----------------Transform----------------") df["Fee"] = df['Fee'].transform(lambda x : x + 1000)