Programming with Data Lab 7 Tuesday, 4 Dec. 2018 Stelios Sotiriadis Prof. Alessandro Provetti
What we will learn today? Recap on gradient descent SQLite Pandas
Lets review the code and run it together: Class6-grad_descent(mx+b).py Gradient descent Lets review the code and run it together: Class6-grad_descent(mx+b).py
Portable relational DBMS The SQLite 3 Module Portable relational DBMS
Important data is … Shared (centralized) Frequently updated but long-term relevant Mostly (80%, according to recent reviews) sitting inside RDBMS Data management needs a proper, application-independent design Entity-Relationship (ER) and Unified Modelling Language (UML) are visual language for defining the structure of data
Relational DBMSs isolate you from the data: so you don’t spoil it maximise I/O performance (optimization inside) take care of multiple access and authorization take care of back-up and durability (w. HW) provide a uniform interface: the SQL syntax you need a monster software on a monster computer
SQLite language-specific drivers support SQL embedding data sits in a local file ideal for local testing ideal for presenting data in an easily accessible standard format
Examples Examples of SQLite Lets run it! Class7-sqlite-queries.py import sqlite3 # Create a connector and a database called mydb conn = sqlite3.connect('mydb') # create a cursor (a way to run SQL queries) cursor = conn.cursor() # example of an SQL statement (assuming there is a table…) cursor.execute('''SELECT * FROM users''') # fetch results and save it in all_rows all_rows = cursor.fetchall() # access rows using a for loop (row[0] first column data) for row in all_rows: print(row[0], row[1], …) Lets run it! Class7-sqlite-queries.py
Pandas Modules
Basic idea Relational DBs might be seen as the computer version of paper ledgers and registries Spreadsheet might be seen as a computer version of a balance sheet a proper naming mechanism: A1, A2, B2… they greatly extended balance sheets. Now they contain lots of data. a whole-new class of what if? queries becomes available however…
Python and spreadsheets… try to replicate the positional organization of spreadsheets into python iterables support for data alignment other features my_dict = {'A': [1, 2], 'B': ['John', 4]} my_data_frame = pd.DataFrame(data=my_dict) print my_data_frame A B 0 1 3 1 John 4
The Data frames! Endows data with the tabular structure Often created by importing data, e.g. from a CSV file Handles columns well, type inference…
Pandas the DataFrame type has about 203 methods, e.g. the read_csv method has about 54 parameters hardly a need to develop ad hoc functions for our import tasks Try the Pandas cookbook
Jupyter notebook The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. In command prompt run: Jupyter notebook Then: Upload: Class7-pandas-tutorial-1.ipynb Class7-pandas-tutorial-2.ipynb Class7-pandas-tutorial-3.ipynb Uses three datasets: https://www.dcs.bbk.ac.uk/~stelios/pwd2018/datasets/Class7-gold_prices.csv https://www.dcs.bbk.ac.uk/~stelios/pwd2018/datasets/Class7-oil_prices.csv https://www.dcs.bbk.ac.uk/~stelios/pwd2018/datasets/Class7-gas_prices.csv
Lets run it! Lets use Jupyter notebooks! How to work with data in Pandas: Run: Class7-pandas-tutorial-1.ipynb How to clean data using Pandas: Run: Class7-pandas-tutorial-2.ipynb Combine different csv files for visualizations: Run: Class7-pandas-tutorial-3.ipynb