Download presentation
Presentation is loading. Please wait.
Published byVivien Whitehead Modified over 9 years ago
1
SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina Berendt Last updated: 2015-10-30 1
2
Where are we? 2
3
Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 3
4
How many parliamentarians does each country have? 4
5
How long are political functions held, on average? 5
6
How often do countries vote for/against things? (Note: artificial data!) 6
7
Is there a relation between length of time in office and age? 7
8
Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 8
9
Incremental changes to databases Let us see how we can add information to an existing database. Let us modify – The conceptual model (EER) – The logical model (relations) – The physical model (database) in turn 9
10
The diagram 10
11
Assume we have voting data Just some examples of real EU voting data – http://www.elprg.eu/data.htm http://www.elprg.eu/data.htm – http://personal.lse.ac.uk/hix/ (overview, link to the next one) http://personal.lse.ac.uk/hix/ – http://www.votewatch.eu/ http://www.votewatch.eu/ – http://www.itsyourparliament.eu/api/ http://www.itsyourparliament.eu/api/ For simplicity, assume we have a CSV file – If it‘s a different format, need some more transformation For simplicity, I generated random data 11
12
Artifical voting data (votes2.csv: 504 votes) 12
13
Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 13
14
Adding these data to the database: (1) Creating a new table Table a_votes (Missing: primary and foreign keys) 14
15
Adding these data to the database: (2) Importing the data into the table LOAD DATA INFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Session 7 - SQL3\\votes2.csv' INTO TABLE a_votes FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n' (Note: The file path specification is different on Mac.) 15
16
Note LOAD DATA INFILE is of course not only useful for adding data to an existing database. You could also build a database from scratch in this way. 16
17
Linking the new to the old data (just another join) 17
18
Scenario 2 (more common in real life): Our new data do not have the same key information as the old data 18
19
New table & data import for scenario 2 Table a_votes2 (Missing: primary and foreign keys) 19 LOAD DATA INFILE 'C:\\Users\\kurt\\Documents\\Lehr e\\ISI15\\Session 7 - SQL3\\artificial_votes2.txt' INTO TABLE a_votes2 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
20
Sample data for scenario 2 20
21
Linking the new to the old data (record linkage – not necessarily via the primary keys) 21
22
What are the risks and opportunities of scenario 2? 22
23
Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 23
24
How many parliamen- tarians does each country have? (Order the result by country name) 24
25
For how long do parliamentarians hold a political function? (in days) 25 http://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html 21,852 rows …
26
How long are positions held, on average? 26
27
OK, minus and AVG and COUNTs are fine, but what about more complex measures? For example, is there a relation between – length of time in office – and age? (do older parliamentarians stay longer in office than younger people, or vice versa)? You could investigate this hypothesis with the help of the Pearson correlation coefficient 27
28
Pearson correlation in SQL (1) 28 https://www.vanheusden.com/misc/pearson.php
29
Pearson correlation in SQL (2) SELECT user1, user2, ((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r, n FROM (SELECT n1.user AS user1, n2.user AS user2, SUM(n1.rating) AS sum1, SUM(n2.rating) AS sum2, SUM(n1.rating * n1.rating) AS sum1sq, SUM(n2.rating * n2.rating) AS sum2sq, SUM(n1.rating * n2.rating) AS psum, COUNT(*) AS n FROM testdata AS n1 LEFT JOIN testdata AS n2 ON n1.movie = n2.movie WHERE n1.user > n2.user GROUP BY n1.user, n2.user) AS step1 ORDER BY r DESC, n DESC 29 Don‘t worry, you will probably never have to do such a thing... https://www.vanheusden.com/misc/pearson.php
30
A general question: Can you compute anything in SQL? = Can you compute anything that can be computed (by a programming language such as python)? In principle, yes (Theoretical result about Turing equivalence: cf. http://stackoverflow.com/questions/900055/is-sql-or-even-tsql-turing-complete ) http://stackoverflow.com/questions/900055/is-sql-or-even-tsql-turing-complete So what do you need (e.g. python) programs and other software for? 30
31
Answer: For example, to calculate your analytics in more comfortable ways Excel makes it very easy to calculate a correlation 1.Create (in SQL) one or more tables with the information 2.Export to CSV 3.Import/Load into Excel 4.Calculate the correlation coefficient there 31
32
Answer (2): or for generating a chart 1.Create (in SQL) one or more tables with the information 2.Export to CSV 3.Import/Load into Excel 4.Create a chart 32
33
Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 33
34
How many parliamentarians does each country have? (1) SELECT name, count( * ) INTO OUTFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Session 7 - SQL3\\countries_parliamentarians.csv' FIELDS TERMINATED BY ‘;' LINES TERMINATED BY '\n' FROM represents, country WHERE represents.countryacronym = country.acronym GROUP BY countryacronym ORDER BY name 34
35
How many parliamentarians does each country have? (2) 35
36
How long are political functions held, on average? 36
37
Is there a relation between length of time in office and age? 37
38
TimeInOffice / age (1): Option 1: export the new table directly SELECT datediff( End_date, Start_date), datediff( Start_date, date_of_birth ) INTO OUTFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Sessio n 7 - SQL3\\time2age.csv' FIELDS TERMINATED BY ‘;' LINES TERMINATED BY '\n' FROM parliament_member, in_political_function WHERE parliament_member.MEP_ID= in_political_function.MEP_ID 38
39
TimeInOffice / age (2): And then compute the correlation with Excel... 39
40
TimeInOffice / age (3): Option 2: Create a new table in the database (which you can later export) CREATE TABLE time_in_office2age SELECT datediff( End_date, Start_date`), datediff( Start_date`, date_of_birth ) FROM parliament_member, `in_political_function` WHERE parliament_member.`MEP_ID` = `in_political_function`.`MEP_ID` 40
41
How often do countries vote for/against things? (1) Basic queries (combining these into one query is a bit tricky, so I recommend to query and export this separately): number of YESs grouped by country, number of Nos grouped by country SELECT countryacronym, count( * ) FROM parliament_member, represents, a_votes WHERE parliament_member.MEP_ID = represents.MEP_ID AND parliament_member.MEP_ID = a_votes.MEP_ID AND member_vote LIKE ‘yes%' GROUP BY countryacronym ORDER BY countryacronym SELECT countryacronym, count( * ) FROM parliament_member, represents, a_votes WHERE parliament_member.MEP_ID = represents.MEP_ID AND parliament_member.MEP_ID = a_votes.MEP_ID AND member_vote LIKE 'no%' GROUP BY countryacronym ORDER BY countryacronym 41
42
How often do countries vote for/against things? (2) 42
43
Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 43
44
What data and operations to answer our research question? 44 EUP database Role to duration (CSV) SQL query + export Role to duration (XLS) Import Excel command
45
What data and operations to answer our research question? 45 EUP database Voting data (CSV) Import Y/N Votes by Country (CSV) SQL query + export Y/N Votes by Country (XLS) Import Excel command
46
Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 46
47
Python and other programs can...... access databases: “import“ data from the database while the program is running compute something with it “export“ (write something) to the database Show selected database content to users Ask for their input Do something accordingly Examples? E.g. Web search engines, e-Commerce sites,... Mechanics? See later in the term, Scripting Languages! 47
48
Next 3 weeks Continuing this Bringing in text analytics – How long are speeches on average, by country? – Do people from different countries use certain words/terms more often than others? –... 48
49
Reading For details of all commands, see the MySQL documentation: http://dev.mysql.com/doc/refman/5.7/en/ http://dev.mysql.com/doc/refman/5.7/en/ 49
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.