Presentation is loading. Please wait.

Presentation is loading. Please wait.

Before the class starts: 1) login to a computer 2) start Stata 13.

Similar presentations


Presentation on theme: "Before the class starts: 1) login to a computer 2) start Stata 13."— Presentation transcript:

1 Before the class starts: 1) login to a computer 2) start Stata 13

2 Statistical software: SPSS, Stata, and R SPSSStataR DescriptionCommand driven statistical program Statistical programming environment that also allows interactive use AudienceDesigned for corporate use Designed for researchers/scien tists Designed to be general DocumentationExplains how to use SPSS Explains the analyses Points to original sources AvailabilityInstalled on all Aalto computers? Installed on all TUAS computers Installed on all Aalto computers CostAalto has a site license Student version 35$ Free

3 My take on the software I use Stata and R I am more productive with Stata in the tasks that it is designed for (And Stata has excellent documentation) R is more flexible and better for data management, and is better for making examples People in the DIEM department use mainly SPSS and Stata Some are moving from SPSS to Stata, but no-one moves the other way Students on my courses tend to slightly prefer R because they can install it (legally) on their home computers and they do just fine with that. But R is not the best choice for everyone. You cannot go wrong with Stata.

4 Datasets and command files Datasets Observations on rows Variables on columns Stata works with one file at a time R can work with multiple files at a time Manipulated with commands Data files are never edited! Command files A sequence of data manipulation and analysis commands to be applied to the data Stores the logic of your analysis Should contain a lot of comments where you explain the logic

5 Using the software: Menus vs. Typing commands vs. Command file Menus Good for learning the program Good if you do not remember the command for a particular analysis (Lack of menus is one of the reasons why R has a steeper learning curve) Typing commands This is normally the fastest way to explore the data and experiment with the analyses Command file Should always be used for the analyzes that you want to publish

6 Open the getting started manual and load the auto.dta dataset following the instructions on page 1

7

8 Introduction to Stata

9 1.Using the software as calculator 2.Accessing and reading the documentation 3.Creating and running projects as analysis files 4.Loading and manipulating datasets (e.g. merging, sorting, filtering) 5.Basic exploratory data analysis including means, correlations, etc 6.Basics of graphics 7.Generating data and running simple simulations 8.Creating loops in analysis files and other very basic automation

10 Using Stata as calculator Type thisExplanation 100+2/3 Basic math (100+2)/3 You can use round brackets to group operations so that they are carried out first 5*10^2 The symbol * means multiply, and ^ means "to the power", so this gives 5 times (10 squared), i.e. 500 1/0 undefined results take the value. (missing data) sqrt(4) Square root function https://en.wikibooks.org/wiki/Statistical_Analysis:_an_Introduction_using_R/R/R_as_a_calculator Type display or di followed by some math

11 Continue working through the “1 Introducing Stata – sample session”. Stop when you reach “A simple hypothesis test” on page 13.

12 T-test

13 Continue working through the “1 Introducing Stata – sample session”. Stop when you have done the graph on on page 19.

14

15 Continue working through the “1 Introducing Stata – sample session”

16 Using the help (Chapter 4) Try the following commands help regression help regression diagnostics help regress

17 Using the Do-file Editor Work through the short example in Chapter 13

18 Working with datasets (5- 12)

19 Loading CSV files Load a dataset from UCLA website import delimited using “http://www.ats.ucla.edu/stat/data/test.csv”,clear Inspect the dataset describe summarize codebook http://www.ats.ucla.edu/stat/r/modules/raw_data.htm

20 Loading CSV files from your computer Stata will load and save files to working directory Download the datasets for Data Analysis Assignment 4 (optional) from MyCourses and unzip the file Set your working directory to the directory where you unzipped the files and load the CSV file import delimited using “Orbis_Export_1.csv”, clear

21 Renaming variables Load the auto dataset sysuse auto describe Rename one of the variables rename gear_ratio gears

22 Listing data List subsets of the observations list list in 1/10 list in -1 list in -10/-1 list if foreign == 1

23 More on selecting cases

24 Listing data List subsets of the variables help varlist list make price list m* list m?? list m~ list headroom-turn You can also try describe instead of list

25 Dropping variables drop deletes the specified variables or cases. keep deletes all but the specified variables or cases drop in -1 keep in 1/20 drop price keep m* sysuse auto, clear

26 Manipulating data (11) generate creates new variables and replace modified existing variables generate priceOfPound = price/weight replace weight = weight * 0.453592 egen provides addional functiosn for data generation egen id = seq() Both can be used with if and in generate priceOfForeign = price if foreign == 1 sysyse auto, clear

27 Sorting datasets sort sorts the dataset ascending and gsort allows you to choose the direction list in 1/10 sort mpg foreign list in 1/10 gsort – mpg - foreign list in 1/10

28 Combining datasets: append, merge, joinby (U22)

29 Append sysuse auto, clear pwd save myAuto.dta append using myAuto.dta list erase myAuto.dta

30 Merge webuse dollars, clear list webuse sforce list merge m:1 region using http://www.stata-press.com/data/r13/dollars http://www.stata-press.com/data/r13/dollars list Never use m:m option in merge!

31 Joinby webuse child describe list webuse parent describe list, sep(0) sort family_id joinby family_id using http://www.stata- press.com/data/r13/child describe list, sepby(family_id) abbrev(12)

32 Useful commands for exploratory data analysis

33 sysuse auto, clear summarize, detail codebook inspect correlate table foreign, contents(mean price sd price mean weight sd weight) tabulate mpg foreign tabstat price-gear_ratio, by(foreign) stem mpg

34 Basics of graphics

35 Examples Browse graph examples at: http://www.ats.ucla.edu/Stat/stata/library/GraphExamples/defau lt.htm

36 Exporting graphics as files sysuse auto, clear twoway (scatter mpg weight) (lowess mpg weight), by(foreign) graph export myCarPlot.pdf Click here

37 Kernel density plot kdensity mpg

38 Scatter plot matrix graph matrix price-foreign

39 Scatter plot matrix graph matrix price mpg weight

40 Aggregating and restructuring data

41 Aggregating data preserve collapse (mean) mpg_m = mpg price_m = price (sd) mpg_sd = mpg price_sd = price, by(foreign) list restore

42 Reshaping data between long and wide webuse reshape1, clear list reshape long inc ue, i(id) j(year) list, sepby(id) reshape wide inc ue, i(id) j(year)

43 Simple simulations

44 Generating random numbers Throw ten dice clear set obs 10 generate die = floor(runiform()*6+1) list Generate ten standard normal variables (mean = 0, SD = 1) generate normal = rnormal() list

45 Effects of model misspecification on regression clear set obs 1000 generate x1 = rnormal() generate x2 = x1 + rnormal() generate y = x1 + x2 rnormal() regress y x1 x2 regress y x1

46 Mean of ten dice program dice clear set obs 10 generate die = floor(runiform()*6+1) summarize end dice simulate, reps(10000): dice describe kdensity mean

47 Loops and other basic automation

48 Loops and conditions foreach counter of numlist 1/10 { if(`counter' == 5){ display "Five" } else{ display "Not five" }

49 Conclusion

50 Getting started 1.Study Stata getting started manual and then the user manual 2.Search for online examples 3.Ask for help online (e.g. course forum) 1.If you have a problem, it often helps to post your full analysis file or log https://gist.github.com https://gist.github.com

51 http://www.ats.ucla.edu/stat/dae/


Download ppt "Before the class starts: 1) login to a computer 2) start Stata 13."

Similar presentations


Ads by Google