Introduction to pandas

Slides:



Advertisements
Similar presentations
Lesson 1 Getting Started.
Advertisements

Spreadsheets… a software program that allows you to use rows and columns of data to manage, predict, and present information. COLUMN ROW CELL (intersection)
Access Quiz October 24, The database objects bar in Access contains icons for tables, queries, forms and reports 1.True 2.False.
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
BITMAP INDEXES Parin Shah (Id :- 207). Introduction A bitmap index is a special kind of index that stores the bulk of its data as bit arrays (commonly.
Pandas: Python Programming for Spreadsheets Pamela Wu Sept. 17 th 2015.
BeginningQuiz 1. Excel is: A. Part of Microsoft Office B. Application Software C. Spreadsheet Software D. None of the above E. A, B, and C.
Label the Spreadsheet B35 ABC 1 AfternoonActivities 2 ActivityBoysGirls 3 Reading57 4 Sports84 5 Video games51 6 Totals: Row 2 Column B Data.
Chapter 2 Excel Review.
Introduction to Excel The Basics of Microsoft Excel 2010.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
The Coin Jar Mystery. The Problem Tyler's Hardware store is having a contest to guess the the number of coins in a jar. To be entered in a drawing for.
Introduction to a Database Defining a database Database window in Access The six items in window: Tables, Queries Forms, Reports, Macros, Modules.
B10$10.00 AB 1 Monthly Budget 2 3 Allowance$ Expenses 6 Movies$ Music$ Food$ Miscellaneous$ Fun$ Bus$10.00.
Do I Understand Spreadsheets? Assessment Game. Question 1 What is a spreadsheet?
Matrix Multiplication The Introduction. Look at the matrix sizes.
Solving Equations Using Logs. True or False? 1.Log 4 = log Log 15 = log 3 x log 5 3.Log 4 = log 8 – log 2 4.Log 64 = 2 log 8 5.Log 64 = 8 log 2.
Precalculus Section 14.1 Add and subtract matrices Often a set of data is arranged in a table form A matrix is a rectangular.
Data Management Data Types R basic 11/11/2013. Data Types 1. Vector 2. Factor 3. Matrix 4. Array 5. List 6. Dataframe.
Comparative Relational Thinking
BITMAP INDEXES Barot Rushin (Id :- 108).
Solving Systems of Equations Using Matrices
Python for Data Analysis
Python for data analysis Prakhar Amlathe Utah State University
Warmups – solve using substitution
Click the mouse button or press the Space Bar to display the answers.
IST256 : Applications Programming for Information Systems
Digital Signage Solutions -
SPSS Assignment Help. Sage-Fox.com Free PowerPoint Templates SPSS is an abbreviation to Statistical Package for Social Science. It’s a windows based software.
Honolulu Rental Management Services
External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with.
Visitor Tracking Software -
Content Development Services -
Data Tables, Indexes, pandas
Python Visualization Tools: Pandas, Seaborn, ggplot
regex (Regular Expressions) Examples Logic in Python (and pandas)
PYTHON Prof. Muhammad Saeed.
Data Science with Python Pandas
TRUTH TABLES.
1.
Determinant of a Matrix
Lecture 6: Data Quality and Pandas
Python for Data Analysis
Multidimensional array

Pandas John R. Woodward.
Slides based on Z. Ives’ at University of Pennsylvania and
Pandas Based on: Series: 1D labelled single-type arrays DataFrames: 2D labelled multi-type arrays Generally in 2D arrays, one can have the first dimension.
Dr. Sampath Jayarathna Cal Poly Pomona
雲端計算.
Matlab Training Session 2: Matrix Operations and Relational Operators
Chapter 15 Excel Review.
Dr. Sampath Jayarathna Old Dominion University
Data Wrangling with pandas
AP Statistics Warm-up: Problem 5.26.
Composition & Inverses Review
Collecting, Analyzing, and Visualizing Data with Python Part I
Sample Question Styles
PYTHON PANDAS FUNCTION APPLICATIONS
regex (Regular Expressions) Examples Logic in Python (and pandas)
CSE 231 Lab 15.
DATAFRAME.
INTRODUCING PYTHON PANDAS:-SERIES
P 72 (PDF 76) Figure 32 Information item name Rules in columns
Data transfer between files,sql databases and dataframes
Chapter 7 Section 7.2 Matrices.
By : Mrs Sangeeta M Chauhan , Gwalior
(Type Answer Here) (Type Answer Here) (Type Answer Here)
Best Signage Companies Sydney, AUS - Visibilitymarketing.com.au
Printing Services Australia - Visibilitymarketing.com.au
Presentation transcript:

Introduction to pandas Sahil Dua (@sahildua2305)

The team Sahil Dua (@sahildua2305) Booking.com Go-GitHub Linguist Answer the question, “Why are we the ones to solve the problem we identified?” Sahil Dua (@sahildua2305) Booking.com Go-GitHub Linguist DuckDuckGo Graduate Software Developer Open Source Contributor Open Source Contributor Open Source Community Leader

Pandas But, why?

Pandas Data Structures Series DataFrame index values index columns A 6 B 3.14 C -4 D foo bar baz A x 6 True B y 10 C z NaN False Series: 1-D labeled NumPy array DataFrame: 2D table with row labels (index) and column labels (columns)

Creating Series 1 2 3 4 A 1 B 2 C 3 D 4 import pandas as pd s1 = pd.Series([1, 2, 3, 4]) s2 = pd.Series([1, 2, 3, 4], index=[‘A’, ‘B’, ‘C’, ‘D’]) 1 2 3 4 A 1 B 2 C 3 D 4

Creating DataFrame foo bar baz x 6 True 1 y 10 2 z NaN False df = pd.DataFrame({‘foo’: [‘x’, ‘y’, ‘z’], ‘bar’: [6, 10, None], ‘baz’: [True, True, False]}) foo bar baz x 6 True 1 y 10 2 z NaN False

Column Selection foo bar baz x 6 True 1 y 10 2 z NaN False x 1 y 2 z x 6 True 1 y 10 2 z NaN False df[‘foo’] x 1 y 2 z

Column Selection foo bar baz x 6 True 1 y 10 2 z NaN False foo bar x 6 x 6 True 1 y 10 2 z NaN False df[[‘foo’, ‘bar’]] foo bar x 6 1 y 10 2 z NaN

Row Selection foo bar baz x 6 True 1 y 10 2 z NaN False foo x bar 6 x 6 True 1 y 10 2 z NaN False df.loc[0] foo x bar 6 baz True

Row Selection foo bar baz x 6 True 1 y 10 2 z NaN False foo bar baz x x 6 True 1 y 10 2 z NaN False df.loc[0:2] foo bar baz x 6 True 1 y 10

Conditional Filtering foo bar baz x 6 True 1 y 10 2 z NaN False df[ (df[‘baz’]) ] foo bar baz x 6 True 1 y 10

Conditional Filtering foo bar baz x 6 True 1 y 10 2 z NaN False df[ (df['foo'] == 'x') | (df['foo'] == 'z') ] foo bar baz x 6 True 2 z NaN False

Data Alignment a b c A 1 2 B 3 C 4 D 5 a b A 1 B 2 C 3 D 4 E 5 a b c A 1 2 B 3 C 4 D 5 a b A 1 B 2 C 3 D 4 E 5 a b c A 2 NaN B 4 C 6 D 8 E

Handling Missing Values new_df = df.dropna() foo bar baz x 6 True 1 y 10 2 z NaN False 3 foo bar baz x 6 True 1 y 10 By default, dropna drops all rows with any missing entry.

Handling Missing Values new_df = df.dropna(how=‘all’) foo bar baz x 6 True 1 y 10 2 z NaN False 3 foo bar baz x 6 True 1 y 10 2 z NaN False By default, dropna drops all rows with any missing entry.

Handling Missing Values new_df = df.fillna(0) foo bar baz x 6 True 1 y 10 2 z NaN False 3 foo bar baz x 6 True 1 y 10 2 z False 3

Handling Missing Values new_df = df.fillna(method=‘ffill’) foo bar baz x 6 True 1 y 10 2 z NaN False 3 foo bar baz x 6 True 1 y 10 2 z False 3

Handling Missing Values new_df = df.fillna(method=‘ffill’, limit=1) foo bar baz x 6 True 1 y 10 2 z NaN False 3 foo bar baz x 6 True 1 y 10 2 z False 3 NaN

Indexing foo bar baz a 6 True 1 b 10 2 c -2 False 3 d 1 2 3 ix = df.index foo bar baz a 6 True 1 b 10 2 c -2 False 3 d 1 2 3 Total 9 subclasses of Index

Indexing foo bar baz a 6 True 1 b 10 2 c -2 False 3 d bar baz foo a 6 df = df.set_index(‘foo’) foo bar baz a 6 True 1 b 10 2 c -2 False 3 d bar baz foo a 6 True b 10 c -2 False d 1

Indexing bar baz foo a 6 True b 10 c -2 False d 1 bar 6 baz True df.loc[‘a’] df.iloc[0] bar 6 baz True

Indexing bar baz foo a 6 True b 10 c -2 False d 1 bar baz foo one a 6 df.set_index([[‘one’, ‘one’, ‘two’, ‘two’], df.index]) bar baz foo a 6 True b 10 c -2 False d 1 bar baz foo one a 6 True b 10 two c -2 False d 1

Indexing bar baz foo one a 6 True b 10 two c -2 False d 1 bar baz foo one = df.loc[‘one’] bar baz foo one a 6 True b 10 two c -2 False d 1 bar baz foo a 6 True b 10

Indexing bar baz foo one a 6 True b 10 two c -2 False d 1 bar 6 baz one = df.loc[‘one’, ‘a’] bar baz foo one a 6 True b 10 two c -2 False d 1 bar 6 baz True

Transposing Data bar baz foo one a 6 True b 10 two c -2 False d 1 one new_df = df.T bar baz foo one a 6 True b 10 two c -2 False d 1 one two foo a b c d bar 6 10 -2 1 baz True False

Statistics df.describe() df.cov() df.corr() df.rank() df.cumsum()

DEMO

The team Thank you! LinkedIn GitHub Twitter Website Answer the question, “Why are we the ones to solve the problem we identified?” Thank you! LinkedIn GitHub Twitter Website @sahildua2305 @sahildua2305 @sahildua2305 http://sahildua.com