Introduction to Computing Using Python Data Storage and Processing How many of you have taken IT 240? Databases and Structured Query Language Python Database Programming
Introduction to Computing Using Python Data storage Beijing × 3 Paris × 5 Chicago × 5 Chicago × 3 Beijing × 6 Bogota × 3 Beijing × 2 Paris × 1 Chicago × 3 Paris × 2 Nairobi × 1 Nairobi × 7 Bogota × 2 one.html four.html two.html three.htmlfive.html We wish to store data about Web pages in a way that Python programs can access the data conveniently
Introduction to Computing Using Python Data storage Beijing × 3 Paris × 5 Chicago × 5 Chicago × 3 Beijing × 6 Bogota × 3 Beijing × 2 Paris × 1 Chicago × 3 Paris × 2 Nairobi × 1 Nairobi × 7 Bogota × 2 one.html four.html two.html three.htmlfive.html To do this, we will use a database
Introduction to Computing Using Python Databases A database consists of one or more tables Each table has a name and consists of rows (records) and columns (attributes) Each attribute has a name and contains data of a specific type Hyperlinks Keywords UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2
Introduction to Computing Using Python Database files Database files are not text files – you can’t read from or write to them directly Instead, communication is performed by commands written in a database language called Structured Query Language (SQL)
Introduction to Computing Using Python SQL SELECT FROM statement Link two.html three.html four.html five.html one.html two.html four.html UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html SELECT Link FROM Hyperlinks Hyperlinks SQL statement SELECT is used make queries into a database. The result called a result table
Introduction to Computing Using Python SQL SELECT FROM statement SQL statement SELECT is used make queries into a database. SELECT Url, Word FROM Keywords Keywords UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 UrlWord one.htmlBeijing one.htmlParis one.htmlChicago two.htmlBogota two.htmlBeijing two.htmlParis three.htmlChicago three.htmlBeijing four.htmlChicago four.htmlParis four.htmlNairobi five.htmlNairobi five.htmlBogota Result
Introduction to Computing Using Python SQL SELECT FROM statement UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html SELECT * FROM Hyperlinks Hyperlinks SELECT statements can use *, a wild card UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html
Introduction to Computing Using Python SQL DISTINCT keyword Link two.html three.html four.html five.html one.html UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html SELECT DISTINCT Link FROM Hyperlinks Hyperlinks SQL keyword DISTINCT removes duplicate records in the result table
Introduction to Computing Using Python SQL WHERE clause SQL clause WHERE is used to select only those records that satisfy a condition SELECT Url FROM Keywords WHERE Word = 'Paris' SELECT Url FROM Keywords WHERE Word = 'Paris' Keywords UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 Url one.html two.html four.html “In which pages does word X appear in?”
OperatorExplanation = Equal <> Not equal > Greater than < Less than >= Greater than or equal <= Less than or equal BETWEEN Within an inclusive range Introduction to Computing Using Python SQL WHERE clause SQL clause WHERE is used to select only those records that satisfy a condition SELECT Column(s) FROM Table WHERE Column operator value SELECT Column(s) FROM Table WHERE Column operator value SELECT Column(s) FROM Table WHERE Column BETWEEN value1 AND value2 SELECT Column(s) FROM Table WHERE Column BETWEEN value1 AND value2
Introduction to Computing Using Python Exercise Hyperlinks Keywords UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 Write an SQL query that returns: 1.The URL of every page that has a link to web page four.html SELECT DISTINCT Url FROM Hyperlinks WHERE Link = 'four.html' SELECT DISTINCT Url FROM Hyperlinks WHERE Link = 'four.html'
Introduction to Computing Using Python Exercise Hyperlinks Keywords UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 Write an SQL query that returns: 2.The URL of every page that has an incoming link from page four.html SELECT DISTINCT Link FROM Hyperlinks WHERE Url = 'four.html' SELECT DISTINCT Link FROM Hyperlinks WHERE Url = 'four.html'
Introduction to Computing Using Python Exercise Hyperlinks Keywords UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 Write an SQL query that returns: 3.The URL and word for every word that appears exactly three times in the web page associated with the URL SELECT Url, Word from Keywords WHERE Freq = 3 SELECT Url, Word from Keywords WHERE Freq = 3
Introduction to Computing Using Python Exercise Hyperlinks Keywords UrlLink one.htmltwo.html one.htmlthree.html two.htmlfour.html three.htmlfour.html five.html one.html five.htmltwo.html five.htmlfour.html UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 Write an SQL query that returns: 4.The URL, word, and frequency for every word that appears between 3 and 5 times, inclusive, in the web page associated with the URL SELECT * from Keywords WHERE Freq BETWEEN 3 AND 5 SELECT * from Keywords WHERE Freq BETWEEN 3 AND 5
Introduction to Computing Using Python SQL built-in functions SQL includes built-in math functions such as COUNT() and SUM() There are 3 web pages that mention Paris Keywords UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 3 “How many pages contain the word Paris?” SELECT COUNT(*) FROM Keywords WHERE Word = 'Paris' SELECT COUNT(*) FROM Keywords WHERE Word = 'Paris'
Introduction to Computing Using Python SQL built-in functions SQL includes built-in math functions such as COUNT(), SUM() and AVG() SELECT SUM(Freq) FROM Keywords WHERE Word = 'Paris' SELECT SUM(Freq) FROM Keywords WHERE Word = 'Paris' Keywords UrlWordFreq one.htmlBeijing3 one.htmlParis5 one.htmlChicago5 two.htmlBogota3 two.htmlBeijing2 two.htmlParis1 three.htmlChicago3 three.htmlBeijing6 four.htmlChicago3 four.htmlParis2 four.htmlNairobi5 five.htmlNairobi7 five.htmlBogota2 8 There are a total of 8 occurrances s of ‘Paris’ on these web pages
Introduction to Computing Using Python Another example database seasons weatherdata namenumber winter1 spring2 summer3 fall4 citySeasontemperature Mumbai124.8 Mumbai228.4 Mumbai327.9 Mumbai427.6 London14.2 London28.3 London315.7 London410.4 Cairo113.6 Cairo220.7 Cairo327.7 Cairo422.2 weather.db contains two tables: weatherdata (city text, country text, season int, temperature float) seasons (attributes name text, number int)
“What is the average summer temperature in Mumbai’?” Introduction to Computing Using Python SQL queries involving multiple tables Assume we don’t know the number coding of seasons, then this question requires a lookup of both tables: Use seasons to find match to season name Use weatherdata to find temperature
Introduction to Computing Using Python Standard Library module sqlite3 The Python Standard Library includes module sqlite3 that allows Python programs to access databases >>> import sqlite3 >>> con = sqlite3.connect('web.db') >>> import sqlite3 >>> con = sqlite3.connect('web.db') sqlite3 function connect() takes as input the name of a database and returns an object of type Connection, a type defined in module sqlite3 The Connection object con is associated with database file web.db If database file web.db does not exists in the current working directory, a new database file web.db is created
Introduction to Computing Using Python Standard Library module sqlite3 >>> import sqlite3 >>> con = sqlite3.connect('web.db') >>> cur = con.cursor() >>> import sqlite3 >>> con = sqlite3.connect('web.db') >>> cur = con.cursor() Connection method cursor() returns an object of type Cursor, another type defined in the module sqlite3 Cursor objects are responsible for executing SQL statements
Introduction to Computing Using Python Standard Library module sqlite3 The Python Standard Library includes module sqlite3 provides an API for accessing database files It is an interface to a library of functions that accesses the database files directly >>> import sqlite3 >>> con = sqlite3.connect('web.db') >>> cur = con.cursor() >>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)") >>> import sqlite3 >>> con = sqlite3.connect('web.db') >>> cur = con.cursor() >>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)") The Cursor class supports method execute() which takes an SQL statement as a string, and executes it >>> import sqlite3 >>> con = sqlite3.connect('web.db') >>> cur = con.cursor() >>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)") >>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)") >>> import sqlite3 >>> con = sqlite3.connect('web.db') >>> cur = con.cursor() >>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)") >>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)") Hardcoded values
Introduction to Computing Using Python Parameter substitution In general, the values used in an SQL statement will not be hardcoded in the program but come from Python variables >>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)") >>> url, word, freq = 'one.html', 'Paris', 5 >>> >>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)") >>> url, word, freq = 'one.html', 'Paris', 5 >>>
Introduction to Computing Using Python Querying a database >>> import sqlite3 >>> con = sqlite3.connect('links.db') >>> cur = con.cursor() >>> cur.execute('SELECT * FROM Keywords') >>> cur.fetchall() [('one.html', 'Beijing', 3), ('one.html', 'Paris', 5), ('one.html', 'Chicago', 5), ('two.html', 'Bogota', 5), ('two.html', 'Beijing', 2), ('two.html', 'Paris', 1), ('three.html', 'Chicago', 3), ('three.html', 'Beijing', 6), ('four.html', 'Chicago', 3), ('four.html', 'Paris', 2), ('four.html', 'Nairobi', 5), ('five.html', 'Nairobi', 7), ('five.html', 'Bogota', 2)] >>> >>> import sqlite3 >>> con = sqlite3.connect('links.db') >>> cur = con.cursor() >>> cur.execute('SELECT * FROM Keywords') >>> cur.fetchall() [('one.html', 'Beijing', 3), ('one.html', 'Paris', 5), ('one.html', 'Chicago', 5), ('two.html', 'Bogota', 5), ('two.html', 'Beijing', 2), ('two.html', 'Paris', 1), ('three.html', 'Chicago', 3), ('three.html', 'Beijing', 6), ('four.html', 'Chicago', 3), ('four.html', 'Paris', 2), ('four.html', 'Nairobi', 5), ('five.html', 'Nairobi', 7), ('five.html', 'Bogota', 2)] >>> The result of a query is stored in the Cursor object To obtain the result as a list of tuple objects, Cursor method fetchall() is used
Introduction to Computing Using Python Querying a database >>> cur.execute('SELECT * FROM Keywords') >>> for record in cur: print(record) ('one.html', 'Beijing', 3) ('one.html', 'Paris', 5) ('one.html', 'Chicago', 5) ('two.html', 'Bogota', 5) ('two.html', 'Beijing', 2) ('two.html', 'Paris', 1) ('three.html', 'Chicago', 3) ('three.html', 'Beijing', 6) ('four.html', 'Chicago', 3) ('four.html', 'Paris', 2) ('four.html', 'Nairobi', 5) ('five.html', 'Nairobi', 7) ('five.html', 'Bogota', 2) >>> >>> cur.execute('SELECT * FROM Keywords') >>> for record in cur: print(record) ('one.html', 'Beijing', 3) ('one.html', 'Paris', 5) ('one.html', 'Chicago', 5) ('two.html', 'Bogota', 5) ('two.html', 'Beijing', 2) ('two.html', 'Paris', 1) ('three.html', 'Chicago', 3) ('three.html', 'Beijing', 6) ('four.html', 'Chicago', 3) ('four.html', 'Paris', 2) ('four.html', 'Nairobi', 5) ('five.html', 'Nairobi', 7) ('five.html', 'Bogota', 2) >>> An alternative is to iterate over the Cursor object
Introduction to Computing Using Python Exercises In week10exercisesstart.py