Presentation is loading. Please wait.

Presentation is loading. Please wait.

Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan

Similar presentations


Presentation on theme: "Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan"— Presentation transcript:

1 Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan
Research Data Services Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan - Welcome to the course/ this web lecture on Data retrieval skills - My name is . . .

2 Program: Background (30 min.) - Getting your data - Working with data
Exercise 1: Excel (30 min.) Exercise 2: Stata (40 min.)

3 think now, download later
Getting your data Rule 1: think now, download later

4 Getting your data Formulate a research question What was the influence of using governance reporting standards on earnings management of companies? What data do I need? Variables? Sample: Geography? Time period? Is the data available?

5 What data do I need? Variables Sample Research question:
What was the influence of using governance reporting standards on earnings management of companies? Variables What reporting standards? How to measure earnings management? Control variables: firm size, board members, … Sample Which countries: USA, Europe, the Netherlands Time period: recent, historical? Company type: public, SMEs? Model ? Relationships ?

6 Remarks: Is the data for each database in a single currency ?
What control variables do I need? Do I need to download components to calculate variables I could not download? (Ratios etc.) Is the data for each database comparable in time? If you need more than 1 database: Do you need company identifiers? (see: Blackboard)

7 Which databases are relevant?
Do I need several databases ? Do I need to combine datasets ? Do I need just one database ?

8 Which databases are available?

9 Research Data Services

10 Data Center on Blackboard

11 Research Data Services on Blackboard

12 Research Data Services on Blackboard

13 Research Data Services on Blackboard
Help on software: manuals/websites

14 Research Data Services on Blackboard

15 Using several databases
Company identifiers Search 1 Data 1 Data 2 Search 2 Company identifiers: codes that uniquely identify a company in 1 or more databases

16 Combining data(sets) Company identifiers Data 1 Data 2
Find out which (Company) identification codes are available in all relevant databases ! Examples: ISIN, Sedol, CUSIP, Tickers.

17 Research Data Services on Blackboard
Additional information or tools

18 Research Data Services on Blackboard

19 Blackboard file:

20 Working with data Many different ways to organize data For analysis:
One line (row) = one observation One column = one variable “Tidy” data Different ways to organize data. Best way depends on what you want to do. For example: clearest way to present data in thesis is not best way to organize data for analysis. Software requires certain structure in the data. Can be different for different software packages or even versions. Try to keep actual data/observations separate from comments etc Stata expects data as one observation in each line, and variables in columns: ‘tidy data’.

21 “Untidy” data: example 1
Name y2001 y2002 Alphabet - 2 Johnson & Johnson 16 11 Pfizer 3 1 Name Y(ear) Result Alphabet 2001 - Johnson & Johnson 16 Pfizer 3 2002 2 11 1 Headers have names and data in one cell: ‘y’ = the variable year 2001 and 2002 are values

22 “Untidy” data: example 2
Company Result T-2001 - MSFT-2001 16 GOOG-2001 3 T-2002 2 MSFT-2002 11 GOOG-2002 1 Company Year 2001 Year 2002 T - 2 MSFT 16 11 GOOG 3 1 Company Year Result T 2001 - MSFT 16 GOOG 3 2002 2 11 1 One column, two separate variable values: Name and Year

23 Working with data Basics of data merges: Merging data ≠ Appending data
Merging = Adding variables Appending = Adding observations Merge data on key variables (ID / Codes) Must be available in all data files / datasets Uniquely identify observations (can be a combination of items)

24 A tidy dataset: example 1
‘Name’ and ‘Year’ together uniquely identify a single observation The ‘Result’ column gives variable values Name Year Result Alphabet 2001 - Johnson & Johnson 16 Pfizer 3 2002 2 11 1 N.B.: Unique Company ID codes are often better than the name !

25 Working with data Warning: make sure to keep key ID codes in tact ! ID
ID 1324 21234

26 Restoring the ID Restore original length with Excel: REPT & LEN
REPT() = Repeat LEN() = Length number of characters in a cell

27 Merging data Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2
11 1 Auditor Year Big4 ? John 2001 “Yes” Jane Mary “No” 2002

28 Working with data: merging
Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Auditor Year Big4 John 2001 “Yes” Jane Mary “No” 2002 “1-to-1” Auditor Year GRI Score Big4 John 2001 - “Yes” Jane 16 Mary 3 “No” 2002 2 11 1

29 Working with data: merging
Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Name Gender John “M” Jane “F” Mary

30 Working with data: merging
Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Name Gender John “M” Jane “F” Mary “many-to-1” Auditor Year GRI Score Gender John 2001 - “M” Jane 16 “F” Mary 3 2002 2 11 1

31 Working with data: merging
Auditor Year GRI Score John 2001 - Jane 16 Mary 3 2002 2 11 1 Name Gender John “M” Jane “F” Mary “1-to-m” Auditor Year GRI Score Gender John 2001 - “M” Jane 16 “F” Mary 3 2002 2 11 1

32 Exercise 1: Combining data using Excel
Compustat Global data Preparing the Datastream data Combining both datasets

33 Exercise 2: Combining data using Stata
Introduction Exercise: Compustat Global & Datastream

34 Exercise 2: Combining data using Stata

35

36 Exercise 2: Stata – Command line

37 Exercise 2: Stata – Scripts / Do files
Basics about .do files: Text files with the .do file extension Commands are handled as if they were typed in on the Command line interface Typing “doedit” calls up the do-file editor. Advantages of scripting: Documents what you have done It makes finding mistakes and repairing them easier Add comments to your script(s) (your future self & your supervisor will be grateful)

38 Exercise 2: Stata – Combining the data
Let’s get to work! - Go to the Data Center Blackboard course - Download the data files - Start up the program Stata

39 Need help? The library is there for you !
Website: Blackboard:


Download ppt "Merging data using Excel & Stata Mark Bruyneel & Matthijs de Zwaan"

Similar presentations


Ads by Google