Preparing your Data using Python

Slides:



Advertisements
Similar presentations
CC SQL Utilities.
Advertisements

CAATTs for Data Extraction and Analysis
11 3 / 12 CHAPTER Databases MIS105 Lec14 Irfan Ahmed Ilyas.
Tutorial 8 Sharing, Integrating and Analyzing Data
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Module 2: Using Transact-SQL Querying Tools. Overview SQL Query Analyzer Using the Object Browser Tool in SQL Query Analyzer Using Templates in SQL Query.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
I Want My Data © 2008 Infinite Campus. All Rights Reserved. DL1305-1, DL Angel Lindsey Support Specialist K-12 Solutions Group.
1 MySQL and phpMyAdmin. 2 Navigate to and log on (username: pmadmin)
Introduction to Python
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
WEB API: WHY THEY MATTER ECOL 453/ Nirav Merchant
Introduction to Microsoft Word Date: March 6, 2012 Time: 9:00 AM to 11:00 AM Location: Maher Hall 114 Computer Lab Instructor: Joel Elad.
Chapter 2: SQL – The Basics Objectives: 1.The SQL execution environment 2.SELECT statement 3.SQL Developer & SQL*Plus.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Roll Screen Development Debugging assistance for building Rocks Rolls with screens OSGC, May 2008 Nadya Williams University of Zurich.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
© 2014 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
GEO375 Final Project: From Txt to Geocoded Data. Goal My Final project is to automate the process of separating, geocoding and processing 911 data for.
Google maps engine and language presentation Ibrahim Motala.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Education And Training CTC IT DIVISION PivotLink User Training April 2010.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Introduction to Database Programming with Python Gary Stewart
1 Middle East Users Group 2008 Self-Service Engine & Process Rules Engine Presented by: Ryan Flemming Friday 11th at 9am - 9:45 am.
Survey Training Pack Session 14 – Transferring CSPro, Access and Excel Files to SPSS.
1 New Perspectives on Access 2016 Module 8: Sharing, Integrating, and Analyzing Data.
N5 Databases Notes Information Systems Design & Development: Structures and links.
How to Get Started With Python
DBMS & TPS Barbara Russell MBA 624.
Miscellaneous Excel Combining Excel and Access.
Creating Databases Local storage. join & split
Lesson 10: Dictionaries Topic: Introduction to Programming, Zybook Ch 9, P4E Ch 9. Slides on website.
GO! with Microsoft Office 2016
Metis Data Science Meetup:
Framework for a Forensic Audit and Investigative Capability
GO! with Microsoft Access 2016
Python Training in Chennai
Dynamic Input with SQL Queries
Data File Import / Export
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
Exam Braindumps
Tableau Overview  Tableau is widely used data visualization and BI tool. Tableau is simple to use and has extensive visualization capability that make.
ECONOMETRICS ii – spring 2018
Intro To Design 1 Elementary School Library: User Sub-System Class Diagrams Software Engineering CSCI-3321 Dr. Tom Hicks Computer Science Department.
System And Application Software
Soo Park and Janine Aquino
Preparing your Data using Python
Power Query Discovery and connectivity to a wide range of data sources
Attributes of Information
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Microsoft Office Access 2003
Analytics Plus Product Overview 1.
Microsoft Excel 2007 The L Line The Express Line to Learning L Line
Microsoft Office Access 2003
Chapter 7 Using SQL in Applications
Access: Access Basics Participation Project
Navya Thum January 30, 2013 Day 5: MICROSOFT EXCEL Navya Thum January 30, 2013.
Tutorial 6 PHP & MySQL Li Xu
Working With Databases
Executive Admin Assistant
Unit J: Creating a Database
Applying Data Warehouse Techniques
Executive Admin Assistant
Web Application Development Using PHP
Visual Basic for Applications: Introduction
Presentation transcript:

Preparing your Data using Python Samuel G. Mori, CISA Managing Partner, Analytics & Advisory Services Spyrion LLC April 12, 2018

Background Samuel G. Mori, CISA, Six Sigma Green Belt Managing Partner, Analytics and Advisory Services Software Quality Assurance, Internal/External Audit, Business Intelligence and Reporting, Advisory Services (GRC and Analytics) Subject matter expertise within commercial, manufacturing, healthcare, biomedical and entertainment sectors B.S. Cognitive Science – Human Computer Interaction (UC San Diego) M.S. Accountancy – Accounting Information Systems (San Diego State) M.S. Data Science – Analytics & Modeling (Northwestern)

Agenda Learning Objectives Why should I prepare my data? What types of data might I encounter? How can Python help me?

Learning Objectives Understand the importance of preparing your data for analysis Understand different types of data formats you may encounter Understand what Python is and why you should use it Understand strategies and techniques for importing, preparing, and saving your data using Python

Why should I prepare my data? Garbage in, garbage out Reduce errors Remove duplicate records Fix missing values Correct range values Fix formatting (i.e. date, text, number)

Experience Check How many people have experience with Python? What types of data formats do you use in your organizations? CSV, Excel, PDF, JSON, XML, SQL databases, etc What types of tools do you use? Excel, ACL, IDEA, SQL Server, Python, R, SAS, Cognos, etc

What types of data formats might I encounter? Comma Separated Value (CSV) Excel JavaScript Object Notation (JSON) Structured Query Language (SQL) And more! Python can help with these!

CSV Example SFO Airport Survey Results

Excel Example SFO Airport Survey Results

JSON Examples Trip Advisor JSON file Yelp JSON file

SQL Example Sample Customer Data

What is Python? Definition Object-oriented, high-level programming language Used as a scripting or glue language to connect existing components together Simple, easy to learn syntax emphasizes readability Supports modules and packages Python interpreter and the extensive standard library are FREE!

What is Python? (cont.) Key Python Package: Pandas Open source library that allows you to work with CSV, Excel, JSON, and SQL database files, pull them into tables (called dataframes), and perform various data analysis techniques.

Coding Basics Some basic python syntax to keep in mind: Declaring a variable (always to the left of equal sign) File names (can use “ “ or ‘ ‘) dataframe = pd.read_excel(‘file_name.xlsx', ‘sheet_name’) Or file_name = ‘file_name.xlsx’ sheet_name = ‘sheet_name’ dataframe = pandas.read_excel(file_name, sheet_name)

Coding Basics (cont.) Some basic python syntax to keep in mind: Using library packages Import pandas as pd #calling pandas library and creating reference ‘pd’ dataframe = pd.read_excel(‘file_name.xlsx', ‘sheet_name’) Or dataframe = pandas.read_excel(‘file_name.xlsx', ‘sheet_name’)

SFO Airport Customer Survey Data – Excel & CSV files Case Study SFO Airport Customer Survey Data – Excel & CSV files

Importing the Data How do I import an Excel file?

Data Characteristics What columns do we have?

Data Characteristics What if I just want a subset of these columns?

Data Characteristics What columns do I have and what are their data types?

Data Characteristics How many columns and records do I have? Can I do a count of different values within a column?

Modifying Data Values Lets look at the data dictionary How do I replace values to make them meaningful?

Saving to Excel How do I save this new file? What does my file look like?

Importing the Data How do I import a CSV file? What is NaN?

Fixing Error Values How do I fix NaN values?

Adding Custom Columns What if I want to add the Year in a column?

Identifying Value Ranges How do I look at the data value ranges for multiple columns?

Saving to CSV How do I save this new file? What does my file look like?

Appendix

Additional Information Python Development Environments Enthought Canopy https://www.enthought.com/product/canopy/ Anaconda/Spyder https://www.anaconda.com/download/ Python Libraries Pandas http://pandas.pydata.org/

Questions?