Data Wrangling with pandas

Slides:



Advertisements
Similar presentations
Slide 3Slide 3-Introduction Slide 4Slide 4-Home Page Slide 5Slide 5-Adding New Group Slide 6 - 7Slide Naming of groups and settings Slide 8 - 9Slide.
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Section 6-3 Applications of Normal Distributions.
6-3 Applications of Normal Distributions This section presents methods for working with normal distributions that are not standard. That is, the mean is.
Data Preparation for Data Mining Prepared by: Yuenho Leung.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
 Cut and paste sometimes works  More likely want to go to temp sheet  Get it in any way you can  AND THEN clean it up.
Detecting univariate outliers Detecting multivariate outliers
 1.1: Introduction  1.2: Descriptions  1.2.1: White wine description  1.2.2: Brest Tissue description  1.3: Conclusion.
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
Pandas: Python Programming for Spreadsheets Pamela Wu Sept. 17 th 2015.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
 SpS Auditing updated to record name rather than reference Staff ID  Staff accounts can be deleted even if SpS activity  Deleting Staff updated with.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
Intro to Excel - Session 5.21 Tutorial 5 - Session 5.2 Working with Excel Lists.
MySQL Importing and creating a database. CSV (Comma Separated Values) file CSV = Comma Separated Values – they are simple text files containing data which.
Today – Friday, March 22, 2013  Introduction into scoring the SAT  Learning Target : Use Pythagorean Theorem, Special Right Triangles and Trigonometry.
Databases Computer Technology. First Record Last Record New Record Previous Record Current Record Next Record Working with Microsoft Access (Database)
Adding Reports to a Database. Why do we use Reports? Reports are well-designed printed pages that offer several advantages: Reports are well-designed.
GEOVISUALIZATION: VISUALIZE THAT ON A MAP Sarah G. Park April 14, 2016.
Introduction to GIS Programming Final Project Submitted by Todd Lenkin Geography 375 Spring of 2011 American River College.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
IFS180 Intro. to Data Management Chapter 10 - Unions.
Microarray data collection for different stresses
Lesson 12: Data Analysis Class Participation: Class Chat:
A Smart Tool to Predict Salary Trends of H1-B Holders
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
CSE111 Introduction to Computer Applications
Information Systems and Databases
Product Training Program
How to Use Google Forms to Streamline Instruction
Microsoft Excel Basic Skills
Lesson 12: Data Analysis Topic: Data Analysis, Readings on website.
Lesson 1: Introduction to Trifacta Wrangler
Test Information Distribution Engine (TIDE)
REDCap Data Migration from CSV file
Elementary Statistics
Lesson 1: Introduction to Trifacta Wrangler
Informational PDF #5 How to Prepare Your Precoding File and Upload it into the Portal System.
IVend Retail 6.5 Dashboard Designer.
Introduction to pandas
Univariate Data Exploration
Data Migration to DOORS DNG Presented By Adam Hammett
Lesson 1: Introduction to Trifacta Wrangler
Lesson 1: Introduction to Trifacta Wrangler
Lesson 1 – Chapter 1B Chapter 1B – Terminology
Preparing your Data using Python
The t distribution and the independent sample t-test
Preparing your Data using Python
Lecture Slides Elementary Statistics Twelfth Edition
Classification & Prediction
Databases Computer Technology.
Pandas Dataframes: The Kevin Bacon of the data structures in data analytics James B. Davis, PhD Director, Master of Science in Analytics Louisiana State.
1.
Lecture 7: Simple Classifier (KNN)
F.S.A. Computer-Based Item Types
Lecture 6: Data Quality and Pandas
Databases Computer Technology.
Pandas John R. Woodward.
Data Management – Processing
From and Report.
Comparing Images Using Hausdorff Distance
Under “view my data”, scroll down to last option “access raw data”, and click on the icon to “download your data”
Lesson 12: Data Analysis Class Chat: Attendance: Participation
Collecting, Analyzing, and Visualizing Data with Python Part I
PYTHON PANDAS FUNCTION APPLICATIONS
Data transfer between files,sql databases and dataframes
Introduction to Computer Science
Mapping packages Unfortunately none come with Anaconda (only geoprocessing is which does lat/long to Cartesian conversions). matplotlib.
Presentation transcript:

Data Wrangling with pandas Introduction to Data Analysis with Python Data Wrangling with pandas Alireza Akhavan Pour PyLab.akhavanpour.ir

Why data analysis

Tom wants to sell his cars.

تخمین قیمت ماشین

برای پاسخ به سوال به داده احتیاج داریم 3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,13495 3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,16500 1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.50,171.20,65.50,52.40,2823,ohcv,six,152,mpfi,2.68,3.47,9.00,154,5000,19,26,16500 2,164,audi,gas,std,four,sedan,fwd,front,99.80,176.60,66.20,54.30,2337,ohc,four,109,mpfi,3.19,3.40,10.00,102,5500,24,30,13950 2,164,audi,gas,std,four,sedan,4wd,front,99.40,176.60,66.40,54.30,2824,ohc,five,136,mpfi,3.19,3.40,8.00,115,5500,18,22,17450 2,?,audi,gas,std,two,sedan,fwd,front,99.80,177.30,66.30,53.10,2507,ohc,five,136,mpfi,3.19,3.40,8.50,110,5500,19,25,15250 1,158,audi,gas,std,four,sedan,fwd,front,105.80,192.70,71.40,55.70,2844,ohc,five,136,mpfi,3.19,3.40,8.50,110,5500,19,25,17710 1,?,audi,gas,std,four,wagon,fwd,front,105.80,192.70,71.40,55.70,2954,ohc,five,136,mpfi,3.19,3.40,8.50,110,5500,19,25,18920 1,158,audi,gas,turbo,four,sedan,fwd,front,105.80,192.70,71.40,55.90,3086,ohc,five,131,mpfi,3.13,3.40,8.30,140,5500,17,20,23875 0,?,audi,gas,turbo,two,hatchback,4wd,front,99.50,178.20,67.90,52.00,3053,ohc,five,131,mpfi,3.13,3.40,7.00,160,5500,16,22,? 2,192,bmw,gas,std,two,sedan,rwd,front,101.20,176.80,64.80,54.30,2395,ohc,four,108,mpfi,3.50,2.80,8.80,101,5800,23,29,16430 0,192,bmw,gas,std,four,sedan,rwd,front,101.20,176.80,64.80,54.30,2395,ohc,four,108,mpfi,3.50,2.80,8.80,101,5800,23,29,16925 0,188,bmw,gas,std,two,sedan,rwd,front,101.20,176.80,64.80,54.30,2710,ohc,six,164,mpfi,3.31,3.19,9.00,121,4250,21,28,20970 0,188,bmw,gas,std,four,sedan,rwd,front,101.20,176.80,64.80,54.30,2765,ohc,six,164,mpfi,3.31,3.19,9.00,121,4250,21,28,21105 1,?,bmw,gas,std,four,sedan,rwd,front,103.50,189.00,66.90,55.70,3055,ohc,six,164,mpfi,3.31,3.19,9.00,121,4250,20,25,24565 0,?,bmw,gas,std,four,sedan,rwd,front,103.50,189.00,66.90,55.70,3230,ohc,six,209,mpfi,3.62,3.39,8.00,182,5400,16,22,30760 0,?,bmw,gas,std,two,sedan,rwd,front,103.50,193.80,67.90,53.70,3380,ohc,six,209,mpfi,3.62,3.39,8.00,182,5400,16,22,41315 https://archive.ics.uci.edu/ml/machine-learning-databases/autos/

توضیحات مجموع داده سطح ریسک بیمه یک ماشین. A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe. میانگین تلفات متوسط تلفات در هر بیمه شده در سال است Target (Label) https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names

توضیحات مجموع داده

کتابخانه های محاسبات علمی در پایتون

مصور سازی داده

کتابخانه های الگوریتمی

Importing and Exporting Data in Python

Importing data

3,. ,alfa-romero,gas,std,two,convertible,rwd,front,88. 60,168. 80,64 3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,13495 3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,16500 1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.50,171.20,65.50,52.40,2823,ohcv,six,152,mpfi,2.68,3.47,9.00,154,5000,19,26,16500 2,164,audi,gas,std,four,sedan,fwd,front,99.80,176.60,66.20,54.30,2337,ohc,four,109,mpfi,3.19,3.40,10.00,102,5500,24,30,13950 2,164,audi,gas,std,four,sedan,4wd,front,99.40,176.60,66.40,54.30,2824,ohc,five,136,mpfi,3.19,3.40,8.00,115,5500,18,22,17450 2,?,audi,gas,std,two,sedan,fwd,front,99.80,177.30,66.30,53.10,2507,ohc,five,136,mpfi,3.19,3.40,8.50,110,5500,19,25,15250 1,158,audi,gas,std,four,sedan,fwd,front,105.80,192.70,71.40,55.70,2844,ohc,five,136,mpfi,3.19,3.40,8.50,110,5500,19,25,17710 1,?,audi,gas,std,four,wagon,fwd,front,105.80,192.70,71.40,55.70,2954,ohc,five,136,mpfi,3.19,3.40,8.50,110,5500,19,25,18920 1,158,audi,gas,turbo,four,sedan,fwd,front,105.80,192.70,71.40,55.90,3086,ohc,five,131,mpfi,3.13,3.40,8.30,140,5500,17,20,23875 0,?,audi,gas,turbo,two,hatchback,4wd,front,99.50,178.20,67.90,52.00,3053,ohc,five,131,mpfi,3.13,3.40,7.00,160,5500,16,22,? 2,192,bmw,gas,std,two,sedan,rwd,front,101.20,176.80,64.80,54.30,2395,ohc,four,108,mpfi,3.50,2.80,8.80,101,5800,23,29,16430 0,192,bmw,gas,std,four,sedan,rwd,front,101.20,176.80,64.80,54.30,2395,ohc,four,108,mpfi,3.50,2.80,8.80,101,5800,23,29,16925 0,188,bmw,gas,std,two,sedan,rwd,front,101.20,176.80,64.80,54.30,2710,ohc,six,164,mpfi,3.31,3.19,9.00,121,4250,21,28,20970 0,188,bmw,gas,std,four,sedan,rwd,front,101.20,176.80,64.80,54.30,2765,ohc,six,164,mpfi,3.31,3.19,9.00,121,4250,21,28,21105 1,?,bmw,gas,std,four,sedan,rwd,front,103.50,189.00,66.90,55.70,3055,ohc,six,164,mpfi,3.31,3.19,9.00,121,4250,20,25,24565 0,?,bmw,gas,std,four,sedan,rwd,front,103.50,189.00,66.90,55.70,3230,ohc,six,209,mpfi,3.62,3.39,8.00,182,5400,16,22,30760 0,?,bmw,gas,std,two,sedan,rwd,front,103.50,193.80,67.90,53.70,3380,ohc,six,209,mpfi,3.62,3.39,8.00,182,5400,16,22,41315 https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

Importing and Exporting Data in Python url = “https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data” df = pd.read_csv(url )

Importing a CSV without a header

Print a dataframe in python header df.head() n = 5

Adding headers

Adding headers header

Exporting a pandas dataframe to CSV

Exporting to different formats in python

Basic insights of Dataset – Data Types

Basic insights of Dataset – Data Types در pandas با fd.dtypes نوع را بررسی میکنیم

dataframe.describe() By default, the dataframe.describe() function skips rows and columns that do not contain

dataframe.describe(include = "all")

Pre-processing data in Python

“data cleaning” or “data wrangling” Data Pre-processing Also known as: “data cleaning” or “data wrangling” It is the process of converting or mapping data from one “raw” form into another format to make it ready for further analysis.

Simple Dataframe Operations In Python, we usually perform operations along columns; each row of the column representsa sample, i.e., a different used car in the database.

Simple Dataframe Operations In Python, we usually perform operations along columns; each row of the column representsa sample, i.e., a different used car in the database.

Simple Dataframe Operations In Python, we usually perform operations along columns; each row of the column representsa sample, i.e., a different used car in the database.

Dealing with missing values in Python

Missing values

How to deal with missing data?

How to Drop missing values in python? dataframes. dropna()

دقت کنید... http://pandas.pydata.org/

How to replace missing values in python?

How to deal with missing data?

Data Formatting

Data formatting

Applying calculations to an entire column

Incorrect data types

Data types in python and pandas

Correcting data types

Data Normalization

Data Normalization

Data Normalization

Methods of normalizing data

Simple feature scaling in python

Min-Max in python

Z-score in python