SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina.

Slides:



Advertisements
Similar presentations
Intro to Access 2007 Lindsey Brewer CSSCR September 18, 2009.
Advertisements

Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
CC SQL Utilities.
Database Ed Milne. Theme An introduction to databases Using the Base component of LibreOffice LibreOffice.
CSE 1561 A Brief MySQL Primer Stephen Scott. CSE 1562 Introduction Once you’ve designed and implemented your database, you obviously want to add data.
Integrating Access with the Web and with Other Programs.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Access Tutorial 1 Creating a Database
Week 2 Normalization and Queries
Attribute databases. GIS Definition Diagram Output Query Results.
Access Lecture 1 Database Overview and Creating Tables Create an Employee Table.
DAT702.  Standard Query Language  Ability to access and manipulate databases ◦ Retrieve data ◦ Insert, delete, update records ◦ Create and set permissions.
Access 2007 ® Use Databases How can Access help you to find and use information?
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Creating Graphs in Excel. Step Summary Input data Highlight data to be graphed Insert  Chart Decide what type of graph to use Finish!
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Classroom User Training June 29, 2005 Presented by:
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
Introduction to database systems
Simple Database.
Summary Data Modeling SDLC What is Data Modeling Application Audience and Services Entities Attributes Relationships Entity Relationship Diagrams Conceptual,Logical.
Introduction to Microsoft Access 2003 Mr. A. Craig Dixon CIS 100: Introduction to Computers Spring 2006.
2005 SPRING CSMUIntroduction to Information Management1 Organizing Data John Sum Institute of Technology Management National Chung Hsing University.
Exploring Microsoft Access Chapter 4 Relational Databases, External Data, Charts, and the Switchboard.
10 May Microsoft Access 2010 Relational databases’ program Part of the Microsoft Office package Administer relational database Update database through.
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
Analyzing Data For Effective Decision Making Chapter 3.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Access 2013 Microsoft Access 2013 is a database application that is ideal for gathering and understanding data that’s been collected on just about anything.
Chapter 5 Database Processing. Neil uses software to query a database, but it has about 25 standard queries that don’t give him all he needs. He imports.
Chapter 17 Creating a Database.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
® Microsoft Office 2013 Access Creating a Database.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
ITGS Databases.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Exploring Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 4- Proficiency: Relational Databases, Pivot.
Exam Format  105 Total Points  25 Points Short Answer  20 Points Fill in the Blank  15 Points T/F  45 Points Multiple Choice  The above are approximations.
Database Management Systems (DBMS)
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
INTRODUCTION TO ACCESS. OBJECTIVES  Define the terms field, record, table, relational database, primary key, and foreign key  Create a blank database.
Phase 2 of database design: Mapping an (E)ER diagram to a relational model Information Structures and Implications 2015 Bettina Berendt Last updated:
John Ykema, Director of Sales & Marketing. Agenda  Understanding the NEW Tool  Table JOINS & Database Views  Building your first report  Charts and.
Classwork: Common Errors Primary keys: don’t forget them! Primary keys: choose the best one! – “Name” and “birthday” are not the best choices. – “Phone.
CHAPTER 1 – INTRODUCTION TO ACCESS Akhila Kondai September 30, 2013.
DAY 18: MICROSOFT ACCESS – CHAPTER 3 CONTD. Akhila Kondai October 21, 2013.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 8 1 Microsoft Office Access 2003 Tutorial 8 – Integrating Access with the.
COMPREHENSIVE Access Tutorial 1 Creating a Database.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Visual Database Creation with MySQL Workbench 도시정보시스템 설계
INTRODUCTION TO DATABASES (MICROSOFT ACCESS)
Practical Office 2007 Chapter 10
Database application MySQL Database and PhpMyAdmin
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
CS 1308 Exam 2 Review.
Structured Query Language (SQL) William Klingelsmith
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Microsoft Office Access 2003
Microsoft Office Access 2003
Spreadsheets, Modelling & Databases
Summary Data Modeling SDLC What is Data Modeling
Extend Excel with Smartlist Designer
Tutorial 8 Sharing, Integrating, and Analyzing Data
CS 1308 Exam 2 Review.
Presentation transcript:

SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina Berendt Last updated:

Where are we? 2

Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 3

How many parliamentarians does each country have? 4

How long are political functions held, on average? 5

How often do countries vote for/against things? (Note: artificial data!) 6

Is there a relation between length of time in office and age? 7

Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 8

Incremental changes to databases Let us see how we can add information to an existing database. Let us modify – The conceptual model (EER) – The logical model (relations) – The physical model (database) in turn 9

The diagram 10

Assume we have voting data Just some examples of real EU voting data – – (overview, link to the next one) – – For simplicity, assume we have a CSV file – If it‘s a different format, need some more transformation For simplicity, I generated random data 11

Artifical voting data (votes2.csv: 504 votes) 12

Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 13

Adding these data to the database: (1) Creating a new table Table a_votes (Missing: primary and foreign keys) 14

Adding these data to the database: (2) Importing the data into the table LOAD DATA INFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Session 7 - SQL3\\votes2.csv' INTO TABLE a_votes FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n' (Note: The file path specification is different on Mac.) 15

Note LOAD DATA INFILE is of course not only useful for adding data to an existing database. You could also build a database from scratch in this way. 16

Linking the new to the old data (just another join) 17

Scenario 2 (more common in real life): Our new data do not have the same key information as the old data 18

New table & data import for scenario 2 Table a_votes2 (Missing: primary and foreign keys) 19 LOAD DATA INFILE 'C:\\Users\\kurt\\Documents\\Lehr e\\ISI15\\Session 7 - SQL3\\artificial_votes2.txt' INTO TABLE a_votes2 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'

Sample data for scenario 2 20

Linking the new to the old data (record linkage – not necessarily via the primary keys) 21

What are the risks and opportunities of scenario 2? 22

Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 23

How many parliamen- tarians does each country have? (Order the result by country name) 24

For how long do parliamentarians hold a political function? (in days) ,852 rows …

How long are positions held, on average? 26

OK, minus and AVG and COUNTs are fine, but what about more complex measures? For example, is there a relation between – length of time in office – and age? (do older parliamentarians stay longer in office than younger people, or vice versa)? You could investigate this hypothesis with the help of the Pearson correlation coefficient 27

Pearson correlation in SQL (1) 28

Pearson correlation in SQL (2) SELECT user1, user2, ((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r, n FROM (SELECT n1.user AS user1, n2.user AS user2, SUM(n1.rating) AS sum1, SUM(n2.rating) AS sum2, SUM(n1.rating * n1.rating) AS sum1sq, SUM(n2.rating * n2.rating) AS sum2sq, SUM(n1.rating * n2.rating) AS psum, COUNT(*) AS n FROM testdata AS n1 LEFT JOIN testdata AS n2 ON n1.movie = n2.movie WHERE n1.user > n2.user GROUP BY n1.user, n2.user) AS step1 ORDER BY r DESC, n DESC 29 Don‘t worry, you will probably never have to do such a thing...

A general question: Can you compute anything in SQL? = Can you compute anything that can be computed (by a programming language such as python)? In principle, yes (Theoretical result about Turing equivalence: cf. ) So what do you need (e.g. python) programs and other software for? 30

Answer: For example, to calculate your analytics in more comfortable ways Excel makes it very easy to calculate a correlation 1.Create (in SQL) one or more tables with the information 2.Export to CSV 3.Import/Load into Excel 4.Calculate the correlation coefficient there 31

Answer (2): or for generating a chart 1.Create (in SQL) one or more tables with the information 2.Export to CSV 3.Import/Load into Excel 4.Create a chart 32

Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 33

How many parliamentarians does each country have? (1) SELECT name, count( * ) INTO OUTFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Session 7 - SQL3\\countries_parliamentarians.csv' FIELDS TERMINATED BY ‘;' LINES TERMINATED BY '\n' FROM represents, country WHERE represents.countryacronym = country.acronym GROUP BY countryacronym ORDER BY name 34

How many parliamentarians does each country have? (2) 35

How long are political functions held, on average? 36

Is there a relation between length of time in office and age? 37

TimeInOffice / age (1): Option 1: export the new table directly SELECT datediff( End_date, Start_date), datediff( Start_date, date_of_birth ) INTO OUTFILE 'C:\\Users\\kurt\\Documents\\Lehre\\ISI15\\Sessio n 7 - SQL3\\time2age.csv' FIELDS TERMINATED BY ‘;' LINES TERMINATED BY '\n' FROM parliament_member, in_political_function WHERE parliament_member.MEP_ID= in_political_function.MEP_ID 38

TimeInOffice / age (2): And then compute the correlation with Excel... 39

TimeInOffice / age (3): Option 2: Create a new table in the database (which you can later export) CREATE TABLE time_in_office2age SELECT datediff( End_date, Start_date`), datediff( Start_date`, date_of_birth ) FROM parliament_member, `in_political_function` WHERE parliament_member.`MEP_ID` = `in_political_function`.`MEP_ID` 40

How often do countries vote for/against things? (1) Basic queries (combining these into one query is a bit tricky, so I recommend to query and export this separately): number of YESs grouped by country, number of Nos grouped by country SELECT countryacronym, count( * ) FROM parliament_member, represents, a_votes WHERE parliament_member.MEP_ID = represents.MEP_ID AND parliament_member.MEP_ID = a_votes.MEP_ID AND member_vote LIKE ‘yes%' GROUP BY countryacronym ORDER BY countryacronym SELECT countryacronym, count( * ) FROM parliament_member, represents, a_votes WHERE parliament_member.MEP_ID = represents.MEP_ID AND parliament_member.MEP_ID = a_votes.MEP_ID AND member_vote LIKE 'no%' GROUP BY countryacronym ORDER BY countryacronym 41

How often do countries vote for/against things? (2) 42

Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 43

What data and operations to answer our research question? 44 EUP database Role to duration (CSV) SQL query + export Role to duration (XLS) Import Excel command

What data and operations to answer our research question? 45 EUP database Voting data (CSV) Import Y/N Votes by Country (CSV) SQL query + export Y/N Votes by Country (XLS) Import Excel command

Agenda 1.Our goal: answer interesting questions 2.Changing databases – a design view 3.Importing, and more on combining data 4.Creating analytics and storing their values 5.Exporting data 6.Putting it all together: From goal to flowchart of data and processing steps 7.Preview: Database connectivity – (Python and other) programs and databases 46

Python and other programs can access databases: “import“ data from the database while the program is running compute something with it “export“ (write something) to the database Show selected database content to users Ask for their input Do something accordingly Examples? E.g. Web search engines, e-Commerce sites,... Mechanics? See later in the term, Scripting Languages! 47

Next 3 weeks Continuing this Bringing in text analytics – How long are speeches on average, by country? – Do people from different countries use certain words/terms more often than others? –... 48

Reading For details of all commands, see the MySQL documentation: