Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Econometric Modeling Through EViews and EXCEL
Managerial Economics in a Global Economy
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Forecasting Using the Simple Linear Regression Model and Correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 10 Curve Fitting and Regression Analysis
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by means of inference statistical methods.
Objectives (BPS chapter 24)
Lecture 8 Relationships between Scale variables: Regression Analysis
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Statistical Methods Chichang Jou Tamkang University.
Additional Topics in Regression Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Topic 3: Regression.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression and Correlation
Simple Linear Regression Analysis
The Practice of Social Research
Chapter 15 Forecasting Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
SIMPLE LINEAR REGRESSION
Chapter 13: Inference in Regression
Regression Method.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Lecturer: Kem Reat, Viseth, PhD (Economics)
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Study design and sampling (and more on ANOVA) Tron Anders Moger
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
Chapter 16 Data Analysis: Testing for Associations.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Academic Research Academic Research Dr Kishor Bhanushali M
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 13 Simple Linear Regression
Linear Regression.
Chapter 11: Simple Linear Regression
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Chapter 13 Additional Topics in Regression Analysis
Presentation transcript:

Various topics Petter Mostad

Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation of required sample size

Epidemiology Epidemiology is the study of diseases in a population –prevalence –incidence, mortality –survival Goals –describe occurrence and distribution –search for causes –determine effects in experiments

Some study types Observational studies –Cross-sectional studies –Cohort studies –Longitudinal studies –Case / control studies Experimental studies –Randomized, controlled experiments –Interventions

Cross-sectional studies Examines a sample of persons, at a single timepoint Time effects rely on memory of respondents Good for estimating prevalence Difficult for rare diseases Response rate bias

Cohort studies and longitudinal studies A sample (cohort) is followed over some time period. If queried at specific timepoints: Longitudinal study Gives better information about causal effects, as report of events is not based on memory Requires that a substantial group developes disease, and that substantial groups differ with respect to risk factors Problem: Long time perspective

Case – control studies Starts with a set of sick individuals (cases), and adds a set of controls, for comparison. Cases and controls should be from same populations Matching controls Good method for rare diseases Problem: Bias from selection

Measures of risk Relative risk Odds ratio Incidence rate ratio Attributable risk

Econometrics ”Econometrics is the field of economics that concerns itself with the application of mathematical statistics and the tools of statistical inference to the empirical measurement of relationships postulated by economic theory” Is the unification of –economic statistics –quantitative economic theory –mathematical economics

About econometrics Variations and extensions of the regression model –heteroscedasticity –autocorrelation models –panel data –logistic regression –non-linear regression models –multivariate regression Matrix computations (linear algebra) is almost indispensable tool Time series data Simultaneous equations models

Heteroscedasticity Recall: When the variances of independent errors in the model vary, the model is heteroscedastic. Example: In a regression model of house size against income, the variance of house sizes might increase with income In case of heteroscedasticity, ordinary regression models are not optimal. Previously, we mentioned variable transformation as a possible solution Much more advanced solutions exist, when the heteroscedasticity is known or can be estimated: Generalized least squares,…

Autocorrelations Recall: When for example the data is from a time series, the random errors for adjacent time steps might be correlated! Improvements in model might reduce problem Standard regression methods are not optimal Modelling and estimating the autoregression gives improved results

Panel data Data collected for the same sample, at repeated time points Corresponds to longitudinal epidemiological studies A combination of cross-sectional data and time series data Increasingly popular study type

Analyzing panel data Fixed effects: Standard regression, but using a constant term differing for each individual –We get a parameter for each person! Random effects: A stochastic variable models variation connected to individual –The individual variation is assumed drawn from a distribution with fixed variance –A generalization of least squares is needed for computations

Analyzing panel data Heteroscedasticity might also here be a problem Autocorrelations Dynamic models: Lagged variables

Logistic regression What if the dependent variable is an indicator variable? The model then has two stages: First, we predict a value z i from predictors as before, then the probability of indicator value 1 is given by Given data, we can estimate coefficients in a similar way as before

Non-linear regression models Ordinary regression is very useful, but it is limited by the linear form of the equations Sometimes, variable transformations can bring the connection between variables to a linear form Other times, this is not possible: The relationship describes the dependent variable as some function of independent variables and some random error. The model may still be estimated by minimizing the errors. This is non-linear regression.

Multivariate regression Instead of one dependent variable, one can have a vector of dependent variables A theory of multivariate multiple regression can be developed (with the help of matrix algebra): Many similar results to ordinary multiple regressions Captures the dependencies between dependent variables

Simultaneous equations models Often, you want to describe interdependencies between variables, rather than explaining one variable in terms of others Example: –Demand is a function of various variables, including price –The same is the case with supply –Setting demand = supply creates simultaneous equations Identifiability? Estimation: Least squares is not optimal; other methods exist

Time series models Time series issues: –Identifying trends, cycles, etc. –Predicting future values Autoregressive models: –Explicit models for time dependencies: (Box-Jenkins, ARIMA models) AR(1) AR(2)

The runs test (for random samples) In a random sample, the probability that an observation is above or below the median is independent of whether the previous observation is. A run is a (maximal) sequence of observations such that all are above the median, or all are below. For n observations, the number of runs has a null distribution under the assumption of no autocorrelation. With too few runs, the null hypothesis of no autocorrelation can be rejected. (Table in Newbold). For large samples, a formula based on a normal approximation can be used.

Sampling in practice Newbold mentions: 1.Information required? 2.Relevant population? 3.Sample selection? 4.Obtaining information? 5.Inferences from sample? 6.Conclusions? Sampling / nonsampling errors

Types of sampling Simple random sampling Stratified sampling Cluster sampling Two-phase sampling (using pilot studies) Each requires somewhat adjusted formulas for estimation

Correcting for finite population in estimations Our estimates of for example population variances, population proportions, etc. assumed an ”infinite” population When the population size N is comparable to the sample size n, a correction factor is necessary. (Why?) Examples: –Variance of population mean estimate: –Variance of population proportion estimate:

Estimation of required sample size An important part of experimental planning The answer will generally depend on the parameters you want to estimate in the first place, so only a rough estimate is possible However, a rough estimate may sometimes be very important to do A pilot study may be very helpful

Example: Estimating the mean of a normally distributed population We want to estimate mean We want a confidence interval to extend a distance a from the estimate We guess at the population variance A sample size estimate: If we have a population of size N, and want a specified, we get at 95% confidence