Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan

Slides:



Advertisements
Similar presentations
Someone hands you a a diskette that has data about schools in the City of Cleveland. They tell you that the school file is in a a dBase format. How do.
Advertisements

Standard Grade Notes General Purpose Packages. These are Software packages which allow the user to solve a range of problems.
Html: getting started HTML is hyper text markup language. It is what web browsers look at on the Internet. HTML documents should be created in a simple.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Using the IEA IDB Analyzer to merge and analyze data.
Copyright (c) Dua'a Mohd Alkhori Blackboard Done By : Dua’a Alkhori Supervised By : Dr.Sahbi Ayari.
1 SPSS Recently it has gone through a name change so your icon on your computer may be under a different name (i.e. PASW- Predictive Analytics SoftWare).
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 3 – Finding, Filtering,
Basic Concept of Data Coding Codes, Variables, and File Structures.
Lecturer: Ghadah Aldehim
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
FIRST COURSE Access Tutorial 1 Creating a Database.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Import Data From Excel File into Database. Contents 1.Understanding Excel structure 2.Understanding jxl.jar library 3.Problem: Import student information.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
MICROSOFT EXCEL EXCEL Is a powerful __________ program that allows users to organize data, complete ______________, make decisions, graph data,
Lesson No:10 Intro. to Excel 2000, Managing & Formatting Worksheet CHBT-01 Basic Micro process & Computer Operation.
Chapter 17 Creating a Database.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
Data Reference The data reference interview And… Cool tools and strategies.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
HTML ( HYPER TEXT MARK UP LANGUAGE ). What is HTML HTML describes the content and format of web pages using tags. Ex. Title Tag: A title It’s the job.
C programming language was developed in the seventies by a group of at the Bell Telephone lab. The C language was the outline of two earlier languages.
Databases and Speadsheets
DAY 18: MICROSOFT ACCESS – CHAPTER 3 CONTD. Akhila Kondai October 21, 2013.
Import Data From Excel File into Database. Contents 1.Understanding Excel structure 2.Understanding jxl.jar library 3.Problem: Import student information.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
INTRODUCTION TO ACCESS 2010 Winter Basics of Access Data Management System Allows for multiple levels of data Relational Database User defined relations.
Amadeus. Availabilty : WRDS + DVD Content: Financial statement information, ownership, and subsidiary information, auditor information of listed and private.
Microsoft Word Mail Merge Versions Mail Merge Follow this tutorial exactly to produce a merge using data from a Word table and the business letter.
Section 3 Computing with confidence. The purpose of this section The purpose of this section is to develop your skills to achieve two goals: 1-Becoming.
Mail Merge Introduction to Word Processing ITSW 1401 Instructor: Glenda H. Easter Introduction to Word Processing ITSW 1401 Instructor: Glenda H. Easter.
An Introduction to. Where did Fedora come from? Boxed set every 6 months == Failed business model [
ECDL ECDL is an important building block, equipping you with the digital skills needed to progress to further education and employment. ECDL teaches you.
HTML5 Basics.
Databases Chapter 9 Asfia Rahman.
Miscellaneous Excel Combining Excel and Access.
Use of CAPI for agricultural surveys
SAS: The last of the great mainframe stats packages
Access Tutorial 1 Creating a Database
Databases Chapter 16.
Practical Office 2007 Chapter 10
>> Introduction to CSS
Information Literacy How to find literature & data
Matlab Training Session 4: Control, Flow and Functions
Introduction to WRDS data platform
7 hacks..
SAS Programming Introduction to SAS.
UNIT 15 Webpage Creator.
Task Management System (TMS)
Intro to PHP & Variables
ECONOMETRICS ii – spring 2018
EBSCO Discovery Service
Lecture 1: Introduction
Access Tutorial 1 Creating a Database
Excel Updates Using Formulas
Introduction to TouchDevelop
Unit# 6: ICT Applications
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
Nagendra Vemulapalli Access chapters 3&5 Nagendra Vemulapalli 1/18/2019.
Sharing data validation activities in the ESS.
Spreadsheets, Modelling & Databases
Access Tutorial 1 Creating a Database
Access Tutorial 1 Creating a Database
Chapter 11 Excel Extension: Now You Try!
Information system analysis and design
Presentation transcript:

Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan Research Data Services Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan - Welcome to the course/ this web lecture on Data retrieval skills - My name is . . .

Program: Background (30 min.) - Getting your data - Working with data Exercise 1: Excel (30 min.) Exercise 2: Stata (40 min.)

think now, download later Getting your data Rule 1: think now, download later

Getting your data Formulate a research question What was the influence of using governance reporting standards on earnings management of companies? What data do I need? Variables? Sample: Geography? Time period? Is the data available?

What data do I need? Variables Sample Research question: What was the influence of using governance reporting standards on earnings management of companies? Variables What reporting standards? How to measure earnings management? Control variables: firm size, board members, … Sample Which countries: USA, Europe, the Netherlands Time period: recent, historical? Company type: public, SMEs? Model ? Relationships ?

Remarks: Is the data for each database in a single currency ? What control variables do I need? Do I need to download components to calculate variables I could not download? (Ratios etc.) Is the data for each database comparable in time? If you need more than 1 database: Do you need company identifiers? (see: Blackboard)

Which databases are relevant? Do I need several databases ? Do I need to combine datasets ? Do I need just one database ?

Which databases are available?

Research Data Services

Data Center on Blackboard

Research Data Services on Blackboard

Research Data Services on Blackboard

Research Data Services on Blackboard Help on software: manuals/websites

Research Data Services on Blackboard

Using several databases Company identifiers Search 1 Data 1 Data 2 Search 2 Company identifiers: codes that uniquely identify a company in 1 or more databases

Combining data(sets) Company identifiers Data 1 Data 2 Find out which (Company) identification codes are available in all relevant databases ! Examples: ISIN, Sedol, CUSIP, Tickers.

Research Data Services on Blackboard Additional information or tools

Research Data Services on Blackboard

Blackboard file:

Working with data Many different ways to organize data For analysis: One line (row) = one observation One column = one variable “Tidy” data Different ways to organize data. Best way depends on what you want to do. For example: clearest way to present data in thesis is not best way to organize data for analysis. Software requires certain structure in the data. Can be different for different software packages or even versions. Try to keep actual data/observations separate from comments etc Stata expects data as one observation in each line, and variables in columns: ‘tidy data’.

“Untidy” data: example 1 Name y2001 y2002 Alphabet - 2 Johnson & Johnson 16 11 Pfizer 3 1 Name Y(ear) Result Alphabet 2001 - Johnson & Johnson 16 Pfizer 3 2002 2 11 1 Headers have names and data in one cell: ‘y’ = the variable year 2001 and 2002 are values

“Untidy” data: example 2 Company Result T-2001 - MSFT-2001 16 GOOG-2001 3 T-2002 2 MSFT-2002 11 GOOG-2002 1 Company Year 2001 Year 2002 T - 2 MSFT 16 11 GOOG 3 1 Company Year Result T 2001 - MSFT 16 GOOG 3 2002 2 11 1 One column, two separate variable values: Name and Year

Working with data Basics of data merges: Merging data ≠ Appending data Merging = Adding variables Appending = Adding observations Merge data on key variables (ID / Codes) Must be available in all data files / datasets Uniquely identify observations (can be a combination of items)

A tidy dataset: example 1 ‘Name’ and ‘Year’ together uniquely identify a single observation The ‘Result’ column gives variable values Name Year Result Alphabet 2001 - Johnson & Johnson 16 Pfizer 3 2002 2 11 1 N.B.: Unique Company ID codes are often better than the name !

Working with data Warning: make sure to keep key ID codes in tact ! ID 00001324 00021234 03441234 ID 1324 21234 3441234

Restoring the ID Restore original length with Excel: REPT & LEN REPT() = Repeat LEN() = Length number of characters in a cell

Merging data Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Auditor Year Big4 ? John 2001 “Yes” Jane Mary “No” 2002

Working with data: merging Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Auditor Year Big4 John 2001 “Yes” Jane Mary “No” 2002 “1-to-1” Auditor Year GRI Score Big4 John 2001 - “Yes” Jane 16 Mary 3 “No” 2002 2 11 1

Working with data: merging Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Name Gender John “M” Jane “F” Mary

Working with data: merging Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Name Gender John “M” Jane “F” Mary “many-to-1” Auditor Year GRI Score Gender John 2001 - “M” Jane 16 “F” Mary 3 2002 2 11 1

Working with data: merging Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Name Gender John “M” Jane “F” Mary “1-to-m” Auditor Year GRI Score Gender John 2001 - “M” Jane 16 “F” Mary 3 2002 2 11 1

Exercise 1: Combining data using Excel Compustat Global data Preparing the Datastream data Combining both datasets

Exercise 2: Combining data using Stata Introduction Exercise: Compustat Global & Datastream

Exercise 2: Combining data using Stata https://download.vu.nl/

Exercise 2: Stata – Command line

Exercise 2: Stata – Scripts / Do files Basics about .do files: Text files with the .do file extension Commands are handled as if they were typed in on the Command line interface Typing “doedit” calls up the do-file editor. Advantages of scripting: Documents what you have done It makes finding mistakes and repairing them easier Add comments to your script(s) (your future self & your supervisor will be grateful)

Exercise 2: Stata – Combining the data Let’s get to work! - Go to the Data Center Blackboard course - Download the data files - Start up the program Stata

Need help? The library is there for you ! Website: http://ub.vu.nl Blackboard: http://bb.vu.nl Email: ResearchDataServices.ub@vu.nl