Data Tables, Indexes, pandas

Slides:



Advertisements
Similar presentations
What will be covered Based on our older Excel for Data Management class but freshened up a bit since everything moved in – Importing and exporting,
Advertisements

Excel Pivot Tables Create Tables and Charts. Excel Add-Ins If Data, Pivot Tables is ghosted, Go to Tools, Add-In and select the Analysis ToolPak.
How to work with Pivot Tables Step by step instruction.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
1 CA201 Word Application Working with Charts Week # 7 By Tariq Ibn Aziz Dammam Community college.
1 Statistical Analysis SC504/HS927 Spring Term 2008 Session 1: Week 16: 18 th January Getting to know your data.
Alok Srivastava Chapter 2 Describing Data: Graphs and Tables Basic Concepts Frequency Tables and Histograms Bar and Pie Charts Scatter Plots Time Series.
Spreadsheets. What are the parts Rows are numbered vertically Columns are lettered horizontally Where rows and columns intersect is called a cell A sheet.
Computer Science 101 Web Access to Databases SQL – Extended Form.
Ch2: Exploring Data: Charts 13 Sep 2011 BUSI275 Dr. Sean Ho HW1 due Thu 10pm Download and open “SportsShoes.xls”SportsShoes.xls.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Plotting Data Exploring Computer Science Lesson 5-7.
Data Analysis Lab 02 Using Crosstabs to compare percentages.
Math 15 Introduction to Scientific Data Analysis Lecture 3 Working With Charts and Graphics.
Plotting in Microsoft Excel. 1) Enter your data into the Excel spreadsheet in table format. Your data should have column headers, row headers and data.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
Chapter 1: Exploring Data
Self-Discovery Concept: Discovering things about yourself can lead to personal success and satisfaction. Unit Essential Question How Can My Principles,
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
DATA MINING Pandas. Python Data Analysis Library A library for data analysis of (mostly) tabular data Gives capabilities similar to Excel and SQL but.
1 M04- Graphical Displays 2  Department of ISM, University of Alabama, 2003 Graphical Displays of Data.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Working In R (with speaker notes)
Data organization and Presentation. Data Organization Making it easy for comparison and analysis of data Arranging data in an orderly sequence or into.
Trace Tables In today’s lesson we will look at:
Lesson 12: Data Analysis Class Participation: Class Chat:
Python for Data Analysis
Python for data analysis Prakhar Amlathe Utah State University
AP CSP: Cleaning Data & Creating Summary Tables
Introduction to SPSS July 28, :00-4:00 pm 112A Stright Hall
MS EXCEL PART 4.
Information Systems and Databases
Computing and Data Analysis
Computing and Data Analysis
Graphing.
Basics of histograms and frequency tables
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
Hello and Welcome! Introduction Syllabus MyStatLab demo
Exploring Computer Science Lesson 5-7
Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis
Lesson 12: Data Analysis Topic: Data Analysis, Readings on website.
Spreadsheets.
Advanced Excel Helen Mills OME-RESA.
Ms jorgensen Unit 1: Statistics and Graphical Representations
Chapter 2 Describing Data: Graphs and Tables
The Key to the Database Engine
Python Visualization Tools: Pandas, Seaborn, ggplot
Introduction to pandas
regex (Regular Expressions) Examples Logic in Python (and pandas)
Homework: Frequency & Histogram worksheet
Guide to Using Excel 2003 For Basic Statistical Applications
Recoding II: Numerical & Graphical Descriptives
CHAPTER 6: Two-Way Tables
1.
Python for Data Analysis
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
商業智慧實務 Practices of Business Intelligence
HIMS 650 Homework set 5 Putting it all together
Displaying Data – Charts & Graphs
Lesson 12: Data Analysis Class Chat: Attendance: Participation
Python for Data Analysis
Trials & laws & Large Numbers (Large Number Theory)
PYTHON PANDAS FUNCTION APPLICATIONS
regex (Regular Expressions) Examples Logic in Python (and pandas)
CSE 231 Lab 15.
Biostatistics Lecture (5).
An Introduction to Data Science using Python
An Introduction to Data Science using Python
Presentation transcript:

Data Tables, Indexes, pandas Lecture 3 Data Tables, Indexes, pandas

Announcements HW1 out

Where we are

Data Science Lifecycle Ask question(s) Obtain data Understand the data Understand the world

Data Science Lifecycle Ask question(s) Obtain data Understand the data Understand the world Your brain The Internet pandas and EDA Inference and prediction

Today: pandas http://pandas.pydata.org/

All data in a column must be the same kind… The DataFrame The dataframe object represents a table with column names, rows, and indices All data in a column must be the same kind… Index Column Column Names Age Sex State … 1 2

How this lecture will work Using the dataset of baby names, we will... Ask questions Break down each question into steps Learn the pandas knowledge needed for each step

What you will learn Data manipulation in pandas Sorting, filtering, grouping, pivot tables Data visualization in pandas and seaborn Bar charts, histograms, scatter plots Prior knowledge of all concepts assumed! ~3 weeks of Data 8 in 1.5 hours Practical, not conceptual

You won’t remember everything, but...

Getting the data (Demo)

What was the most popular name in CA last year? Question 1: What was the most popular name in CA last year? (2-min discussion)

Always have high-level steps Read in the data for CA Keep only year 2016 Sort rows by count Table.read_table Table.where Table.sort

In pandas Read in the data for CA Keep only year 2016 Sort rows by count pd.read_csv Slicing df.sort_values (Demo)

Recap pd.read_csv(...) => DataFrame DataFrame is like the Data 8 Table Series is like a NumPy array Slice DFs by label or by position df.loc and df.iloc DF index is a label for each row, used for slicing df.sort_values(...) like Table.sort

What were the most popular names in each state for each year? Question 2: What were the most popular names in each state for each year? (2-min discussion)

Break it down Put all DFs together Group by state and year pd.concat df.groupby (Demo)

Recap zipfile Work with compressed archives efficiently in-memory df.groupby(...).agg(...) Groups one or more columns, applying aggregate function on each group df.groupby(...).sum() # or .max(), etc. Shorthand for df.groupby(...).agg(np.sum)

When do I need to group? Do I need to count the times each value appears? Do I need to aggregate values together? Am I looping through a column’s unique values?

Can I deduce gender from the last letter of a person’s name? Question 3: Can I deduce gender from the last letter of a person’s name?

Survey Question Which last letter is most indicative of a person’s birth sex? bit.ly/ds100-sp18-c7a g m t z e This is a trick question!

Break it down Compute last letter of each name Group by last letter Visualize distribution series.str df.groupby df.plot (Demo)

Recap series.str To use string methods Use series.apply when you need flexibility df.pivot_table(...) Computes a pivot table df.plot To use plotting methods

When do I need to pivot? Am I grouping by two columns... And do I want the resulting table to be easier to read? Or, am I using pandas plotting on the groups?

Seaborn http://seaborn.pydata.org/index.html

sns.pairplot(df, hue="species") Seaborn Statistical data visualization Has common plots with some bonus features And some fancier plots too Works well with pandas DataFrames sns.pairplot(df, hue="species")

How to Seaborn DataFrame should ideally be in long-form (not grouped) Most Seaborn methods work like this: sns.barplot(x=..., y=..., hue=..., data=df) (Demo)

Recap Pandas for tabular data manipulation Slicing for row/column selection Group with df.groupby Pivot with df.pivot_table Join with pd.merge (covered in lab next week) df.plot for basic plots Seaborn for statistical plots Reference the docs for available methods

Use the docs! And Google.