Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming with Data Lab 7

Similar presentations


Presentation on theme: "Programming with Data Lab 7"— Presentation transcript:

1 Programming with Data Lab 7
Tuesday, 4 Dec. 2018 Stelios Sotiriadis Prof. Alessandro Provetti

2 What we will learn today?
Recap on gradient descent SQLite Pandas

3 Lets review the code and run it together: Class6-grad_descent(mx+b).py
Gradient descent Lets review the code and run it together: Class6-grad_descent(mx+b).py

4 Portable relational DBMS
The SQLite 3 Module Portable relational DBMS

5 Important data is … Shared (centralized)
Frequently updated but long-term relevant Mostly (80%, according to recent reviews) sitting inside RDBMS Data management needs a proper, application-independent design Entity-Relationship (ER) and Unified Modelling Language (UML) are visual language for defining the structure of data

6 Relational DBMSs isolate you from the data: so you don’t spoil it
maximise I/O performance (optimization inside) take care of multiple access and authorization take care of back-up and durability (w. HW) provide a uniform interface: the SQL syntax you need a monster software on a monster computer

7 SQLite language-specific drivers support SQL embedding
data sits in a local file ideal for local testing ideal for presenting data in an easily accessible standard format

8 Examples Examples of SQLite Lets run it! Class7-sqlite-queries.py
import sqlite3 # Create a connector and a database called mydb conn = sqlite3.connect('mydb') # create a cursor (a way to run SQL queries) cursor = conn.cursor() # example of an SQL statement (assuming there is a table…) cursor.execute('''SELECT * FROM users''') # fetch results and save it in all_rows all_rows = cursor.fetchall() # access rows using a for loop (row[0] first column data) for row in all_rows: print(row[0], row[1], …) Lets run it! Class7-sqlite-queries.py

9 Pandas Modules

10 Basic idea Relational DBs might be seen as the computer version of paper ledgers and registries Spreadsheet might be seen as a computer version of a balance sheet a proper naming mechanism: A1, A2, B2… they greatly extended balance sheets. Now they contain lots of data. a whole-new class of what if? queries becomes available however…

11 Python and spreadsheets…
try to replicate the positional organization of spreadsheets into python iterables support for data alignment other features my_dict = {'A': [1, 2], 'B': ['John', 4]} my_data_frame = pd.DataFrame(data=my_dict) print my_data_frame A B 1 John 4

12 The Data frames! Endows data with the tabular structure
Often created by importing data, e.g. from a CSV file Handles columns well, type inference…

13 Pandas the DataFrame type has about 203 methods, e.g.
the read_csv method has about 54 parameters hardly a need to develop ad hoc functions for our import tasks Try the Pandas cookbook 

14 Jupyter notebook The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. In command prompt run: Jupyter notebook Then: Upload: Class7-pandas-tutorial-1.ipynb Class7-pandas-tutorial-2.ipynb Class7-pandas-tutorial-3.ipynb Uses three datasets:

15 Lets run it! Lets use Jupyter notebooks!
How to work with data in Pandas: Run: Class7-pandas-tutorial-1.ipynb How to clean data using Pandas: Run: Class7-pandas-tutorial-2.ipynb Combine different csv files for visualizations: Run: Class7-pandas-tutorial-3.ipynb


Download ppt "Programming with Data Lab 7"

Similar presentations


Ads by Google