Presentation is loading. Please wait.

Presentation is loading. Please wait.

SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina.

Similar presentations


Presentation on theme: "SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina."— Presentation transcript:

1 SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina Berendt Last updated: 2015-10-30 1

2 Where are we? 2

3 Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 3

4 How many parliamentarians does each country have? 4

5 How long are political functions held, on average? 5

6 How often do countries vote for/against things? (Note: artificial data!) 6

7 Is there a relation between length of time in office and age? 7

8 Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 8

9 Incremental changes to databases Let us see how we can add information to an existing database. Let us modify – The conceptual model (EER) – The logical model (relations) – The physical model (database) in turn 9

10 The diagram 10

11 Assume we have voting data Just some examples of real EU voting data – http://www.elprg.eu/data.htm http://www.elprg.eu/data.htm – http://personal.lse.ac.uk/hix/ (overview, link to the next one) http://personal.lse.ac.uk/hix/ – http://www.votewatch.eu/ http://www.votewatch.eu/ – http://www.itsyourparliament.eu/api/ http://www.itsyourparliament.eu/api/ For simplicity, assume we have a CSV file – If it‘s a different format, need some more transformation For simplicity, I generated random data 11

12 Artifical voting data (votes2.csv: 504 votes) 12

13 Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 13

14 Adding these data to the database: (1) Creating a new table Table a_votes (Missing: primary and foreign keys) 14

15 Adding these data to the database: (2) Importing the data into the table LOAD DATA INFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Session 7 - SQL3\\votes2.csv' INTO TABLE a_votes FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n' (Note: The file path specification is different on Mac.) 15

16 Note LOAD DATA INFILE is of course not only useful for adding data to an existing database. You could also build a database from scratch in this way. 16

17 Linking the new to the old data (just another join) 17

18 Scenario 2 (more common in real life): Our new data do not have the same key information as the old data 18

19 New table & data import for scenario 2 Table a_votes2 (Missing: primary and foreign keys) 19 LOAD DATA INFILE 'C:\\Users\\kurt\\Documents\\Lehr e\\ISI15\\Session 7 - SQL3\\artificial_votes2.txt' INTO TABLE a_votes2 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'

20 Sample data for scenario 2 20

21 Linking the new to the old data (record linkage – not necessarily via the primary keys) 21

22 What are the risks and opportunities of scenario 2? 22

23 Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 23

24 How many parliamen- tarians does each country have? (Order the result by country name) 24

25 For how long do parliamentarians hold a political function? (in days) 25 http://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html 21,852 rows …

26 How long are positions held, on average? 26

27 OK, minus and AVG and COUNTs are fine, but what about more complex measures? For example, is there a relation between – length of time in office – and age? (do older parliamentarians stay longer in office than younger people, or vice versa)? You could investigate this hypothesis with the help of the Pearson correlation coefficient 27

28 Pearson correlation in SQL (1) 28 https://www.vanheusden.com/misc/pearson.php

29 Pearson correlation in SQL (2) SELECT user1, user2, ((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r, n FROM (SELECT n1.user AS user1, n2.user AS user2, SUM(n1.rating) AS sum1, SUM(n2.rating) AS sum2, SUM(n1.rating * n1.rating) AS sum1sq, SUM(n2.rating * n2.rating) AS sum2sq, SUM(n1.rating * n2.rating) AS psum, COUNT(*) AS n FROM testdata AS n1 LEFT JOIN testdata AS n2 ON n1.movie = n2.movie WHERE n1.user > n2.user GROUP BY n1.user, n2.user) AS step1 ORDER BY r DESC, n DESC 29 Don‘t worry, you will probably never have to do such a thing... https://www.vanheusden.com/misc/pearson.php

30 A general question: Can you compute anything in SQL? = Can you compute anything that can be computed (by a programming language such as python)? In principle, yes (Theoretical result about Turing equivalence: cf. http://stackoverflow.com/questions/900055/is-sql-or-even-tsql-turing-complete ) http://stackoverflow.com/questions/900055/is-sql-or-even-tsql-turing-complete So what do you need (e.g. python) programs and other software for? 30

31 Answer: For example, to calculate your analytics in more comfortable ways Excel makes it very easy to calculate a correlation 1.Create (in SQL) one or more tables with the information 2.Export to CSV 3.Import/Load into Excel 4.Calculate the correlation coefficient there 31

32 Answer (2): or for generating a chart 1.Create (in SQL) one or more tables with the information 2.Export to CSV 3.Import/Load into Excel 4.Create a chart 32

33 Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 33

34 How many parliamentarians does each country have? (1) SELECT name, count( * ) INTO OUTFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Session 7 - SQL3\\countries_parliamentarians.csv' FIELDS TERMINATED BY ‘;' LINES TERMINATED BY '\n' FROM represents, country WHERE represents.countryacronym = country.acronym GROUP BY countryacronym ORDER BY name 34

35 How many parliamentarians does each country have? (2) 35

36 How long are political functions held, on average? 36

37 Is there a relation between length of time in office and age? 37

38 TimeInOffice / age (1): Option 1: export the new table directly SELECT datediff( End_date, Start_date), datediff( Start_date, date_of_birth ) INTO OUTFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Sessio n 7 - SQL3\\time2age.csv' FIELDS TERMINATED BY ‘;' LINES TERMINATED BY '\n' FROM parliament_member, in_political_function WHERE parliament_member.MEP_ID= in_political_function.MEP_ID 38

39 TimeInOffice / age (2): And then compute the correlation with Excel... 39

40 TimeInOffice / age (3): Option 2: Create a new table in the database (which you can later export) CREATE TABLE time_in_office2age SELECT datediff( End_date, Start_date`), datediff( Start_date`, date_of_birth ) FROM parliament_member, `in_political_function` WHERE parliament_member.`MEP_ID` = `in_political_function`.`MEP_ID` 40

41 How often do countries vote for/against things? (1) Basic queries (combining these into one query is a bit tricky, so I recommend to query and export this separately): number of YESs grouped by country, number of Nos grouped by country SELECT countryacronym, count( * ) FROM parliament_member, represents, a_votes WHERE parliament_member.MEP_ID = represents.MEP_ID AND parliament_member.MEP_ID = a_votes.MEP_ID AND member_vote LIKE ‘yes%' GROUP BY countryacronym ORDER BY countryacronym SELECT countryacronym, count( * ) FROM parliament_member, represents, a_votes WHERE parliament_member.MEP_ID = represents.MEP_ID AND parliament_member.MEP_ID = a_votes.MEP_ID AND member_vote LIKE 'no%' GROUP BY countryacronym ORDER BY countryacronym 41

42 How often do countries vote for/against things? (2) 42

43 Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 43

44 What data and operations to answer our research question? 44 EUP database Role to duration (CSV) SQL query + export Role to duration (XLS) Import Excel command

45 What data and operations to answer our research question? 45 EUP database Voting data (CSV) Import Y/N Votes by Country (CSV) SQL query + export Y/N Votes by Country (XLS) Import Excel command

46 Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 46

47 Python and other programs can...... access databases: “import“ data from the database while the program is running compute something with it “export“ (write something) to the database Show selected database content to users Ask for their input Do something accordingly Examples? E.g. Web search engines, e-Commerce sites,... Mechanics? See later in the term, Scripting Languages! 47

48 Next 3 weeks Continuing this Bringing in text analytics – How long are speeches on average, by country? – Do people from different countries use certain words/terms more often than others? –... 48

49 Reading For details of all commands, see the MySQL documentation: http://dev.mysql.com/doc/refman/5.7/en/ http://dev.mysql.com/doc/refman/5.7/en/ 49


Download ppt "SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina."

Similar presentations


Ads by Google