Data Science with Python University of Cincinnati Philip Bohun
Day 1 Introduction Overview of Python Data Munging/rangling Data Visualization Regression
Introduction Goals: Setup and run a Python environment Understand strengths and weaknesses of Python Basic data manipulation with Python Do basic data wrangling/manipulation Use packages for data analysis
Data Science Is it A or B? → classification Is this weird? → anomoly detection How much/many? → regression How is this organized? → clustering What should I do next? → reinforcement learning
Python Overview Setup Environment Python is an interpreted language Python interpreter Text editor Python is an interpreted language White space is important [in class exercise]
Variables, Functions, Modules Variable: named object that can change value Literal: an unchangeable string or number Function: reusable block of code Module: reusable group of functions and variables [in class exercise]
Control Flow Control flow allows us to make decisions in code Types of control flow Sequence Selection Iteration [in class exercise]
File I/O Files can be opened an closed Files can be text or binary [in class exercise]
Simple Database Interaction Python has a built in library to create and interact with SQLite databases Great for small and local projects [in class exercise]
Data Types and Data Structures Major data types in Python are: Numeric Sequences Sets Mappings [in class exercise]
Data Munging/Wrangling Dates and times are complicated Dealing with empty values (null,NaN, etc.) Handling strings and unstructured data [in class exercise]
Data Visualization Visualizing data can be important for EDA as well as communicating results There are tools specifically designed for data visualization We will cover the basics [in class exercise]
Regression Regressions allow us to answer the question how much or how many There are many types of regressions Let’s look at some simple regressions [in class exercise]
Day 2 Program Design Modules Machine Learning
Program Design Problem decomposition is one of the most important parts of program design Rule of thumb: functions should be < 30 lines A function should do one thing well [in class exercise]
Algorithm Basics Algorithms determine how much work your programs do If not using a library function for something, always search for the best algorithm Don’t start with code, it’s best to solve the problem, then write code [in class example]
Modules Code that is reusable and covers a single topic should be gathered into a module Mixing concerns in software leads to unnecessary complexity making software difficult to test and debug [in class exercise]
Machine Learning Machine learning solves problems of optimization It is very useful for searching very large possibility spaces Also useful when approximate answers are useful [in class lab]
Machine Learning [ FULL PROJECT ]