Lecture 4 Ways to get data into SAS Some practice programming

Slides:



Advertisements
Similar presentations
SAMPLE DESIGN: HOW MANY WILL BE IN THE SAMPLE—DESCRIPTIVE STUDIES ?
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing.
1 A heart fills with loving kindness is a likeable person indeed.
Statistical Concepts (continued) Concepts to cover or review today: –Population parameter –Sample statistics –Mean –Standard deviation –Coefficient of.
Chapter 7 Estimation: Single Population
Lecture Slides Elementary Statistics Twelfth Edition
7-2 Estimating a Population Proportion
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
Chapter 7 Confidence Intervals and Sample Sizes
Ch 10 Comparing Two Proportions Target Goal: I can determine the significance of a two sample proportion. 10.1b h.w: pg 623: 15, 17, 21, 23.
MATH1342 S08 – 7:00A-8:15A T/R BB218 SPRING 2014 Daryl Rupp.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Statistics: Basic Concepts. Overview Survey objective: – Collect data from a smaller part of a larger group to learn something about the larger group.
Chapter 11: Estimation Estimation Defined Confidence Levels
Understanding Inferential Statistics—Estimation
Topic 5 Statistical inference: point and interval estimate
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
There are two main purposes in statistics; (Chapter 1 & 2)  Organization & ummarization of the data [Descriptive Statistics] (Chapter 5)  Answering.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
LECTURE 16 TUESDAY, 31 March STA 291 Spring
PARAMETRIC STATISTICAL INFERENCE
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 3: The Foundations of Research 1.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Chapter 7 Estimation Procedures. Basic Logic  In estimation procedures, statistics calculated from random samples are used to estimate the value of population.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Confidence Intervals: The Basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Distributions of the Sample Mean
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
BUS304 – Chapter 6 Sample mean1 Chapter 6 Sample mean  In statistics, we are often interested in finding the population mean (µ):  Average Household.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Today - Messages Additional shared lab hours in A-269 –M, W, F 2:30-4:25 –T, Th 4:00-5:15 First priority is for PH5452. No TA or instructor Handouts –
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
AP Statistics Chapter 10 Notes. Confidence Interval Statistical Inference: Methods for drawing conclusions about a population based on sample data. Statistical.
CHAPTER-6 Sampling error and confidence intervals.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Estimates and Sample Sizes Chapter 6 M A R I O F. T R I O L A Copyright © 1998,
Lecture 1 Stat Applications, Types of Data And Statistical Inference.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
CHAPTER Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables.
Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.
Instrument design Essential concept behind the design Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public.
1 ES Chapters 14 & 16: Introduction to Statistical Inferences E n  z  
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 9 Estimation using a single sample. What is statistics? -is the science which deals with 1.Collection of data 2.Presentation of data 3.Analysis.
Essential Statistics Chapter 191 Comparing Two Proportions.
Dr.Theingi Community Medicine
Inference: Conclusion with Confidence
Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA
SAS Programming Training
Inference: Conclusion with Confidence
Lesson 2 Topic - Reading raw data into SAS
SAS Programming Training
This Week Review of estimation and hypothesis testing
Statistics in Applied Science and Technology
SA3202 Statistical Methods for Social Sciences
Econ 3790: Business and Economics Statistics
Lecture Slides Elementary Statistics Twelfth Edition
Presentation transcript:

Lecture 4 Ways to get data into SAS Some practice programming Review of statistical concepts

Getting data into SAS DATALINES statement INFILE statement PROC IMPORT Data is contained within a data step INFILE statement Data contained in separate file PROC IMPORT

* List Directed Input: Reading data values separated by spaces.; DATA bp; INFILE DATALINES; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C 84 138 93 143 D 89 150 91 140 A 78 116 100 162 A . . 86 155 C 81 145 86 140 ; RUN ; TITLE 'Data Separated by Spaces'; PROC PRINT DATA=bp; RUN; Obs clinic dbp6 sbp6 dbpbl sbpbl 1 C 84 138 93 143 2 D 89 150 91 140 3 A 78 116 100 162 4 A . . 86 155 5 C 81 145 86 140

* List Directed Input: Reading data values separated by commas; DATA bp; INFILE DATALINES DLM = ',' ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,.,.,86,155 C,81,145,86,140 ; RUN ; TITLE 'Data separated by a comma'; PROC PRINT DATA=bp; RUN;

* List Directed Input: Reading data values from a .csv type file; DATA bp; INFILE DATALINES DLM = ',' DSD ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 ; TITLE 'Reading in Data using the DSD Option'; PROC PRINT DATA=bp; RUN;

List Directed Input: Reading data values separated by tabs ( * List Directed Input: Reading data values separated by tabs (.txt files); DATA bp; INFILE DATALINES DLM = '09'x DSD; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C 84 138 93 143 D 89 150 91 140 A 78 116 100 162 A 86 155 C 81 145 86 140 ; TITLE 'Reading in Data separated by a tab'; PROC PRINT DATA=bp; RUN;

* Reading data from an external file DATA bp; INFILE '/home/ph5415/data/bp.csv' DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; TITLE 'Reading in Data from an External File'; PROC PRINT DATA=bp; clinic,dbp6,sbp6,dbpbl,sbpbl C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 Content of bp.csv

*Using PROC IMPORT to read in data ; PROC IMPORT DATAFILE='/home/ph5415/data/bp.csv' OUT = bp DBMS = csv REPLACE ; GETNAMES = yes; TITLE 'Reading in Data Using PROC IMPORT'; PROC PRINT DATA=bp; PROC CONTENTS DATA=bp;

The CONTENTS Procedure Data Set Name: WORK.BP Observations: 5 Member Type: DATA Variables: 5 Engine: V8 Indexes: 0 Created: 18:15 Tuesday, January 25, 2005 Observation Length: 40 Last Modified: 18:15 Tuesday, January 25, 2005 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 clinic Char 8 32 2 dbp6 Num 8 0 4 dbpbl Num 8 16 3 sbp6 Num 8 8 5 sbpbl Num 8 24

Some Definitions Statistics: The art and science of collecting, analyzing, presenting, and interpreting numerical data. Data: facts and figures that are analyzed Dataset: All the data collected for a study Elements: Units in which data is collected People, companies, schools, households Variables: Characteristics measured on elements People (height, weight) Company (number of employees) Schools (percentage of students who graduate in 5 years) Households (number of computers owned)

Informal Definition Statistics: In a scientific way gain information about something you do not know

Start With Research Question What is the proportion of persons without health insurance in Minnesota? Do newer BP medications prevent heart disease compared to older medications? What is the relationship between grade point average and SAT scores Do persons who eat more F&V have lower risk of developing colon cancer. Does the program DARE reduce the risk of young persons trying drugs?

Statistics Design Study And Question Collect Data Make Conclusions Start With Question Design Study And Collect Data Make Conclusions (Inference) Compute Summary Data to Assess Question.

Statistical Inference Estimation (Chapter 4) Hypothesis Testing (Chapter 5) Comparing population proportions (Chap 6) Comparing population means (Chap 7)

Common Parameters to Estimate Parameter Description m Mean of population p Proportion with a certain trait r Correlation between 2 variables m1 - m2 Difference between 2 means p1 - p2 Difference between 2 proportions s Population standard deviation

Statistical Inference Population with mean m = ? A simple random sample of n elements is selected from the population. The value of is used to make inferences about the value of m. The sample data provide a value for the sample mean .

Sampling Sample: a subset of target population (usually a simple random sample - each sample has equal probability of occurring) Different samples yield different estimates Trying to understand the population parameter (the “true value”) It’s usually not possible to measure the population value

Point Estimate Parameter Point Estimate m Sample mean p Sample proportion r Sample correlation m1 - m2 Difference between 2 sample means p1 - p2 Difference between 2 sample proportions s Sample standard deviation

Interval Estimation In general, confidence intervals are of the form: Estimate = mean, proportion, regression coefficient, odds ratio... SE = standard error of your estimate 1.96 = for 95% CI based on normal distribution

Estimates the population mean: Estimation “What is the average total cholesterol level for MN residents?” Random sample of cholesterol levels sample mean = sum of values / number of observations Estimates the population mean:

estimates the population standard deviation: Estimation “What is the average total cholesterol level for MN residents?” sample standard deviation: estimates the population standard deviation:

Confidence Interval Example Suppose sample of 100 mean = 215 mg/dL, standard deviation = 20 95% CI = = standard error of mean = (215 - 1.96*20/10, 215 + 1.96*20/10) approximately = (211, 219)

Properties of Confidence Intervals As sample size increases, CI gets smaller If you could sample the whole population; Can use different levels of confidence 90, 95, 99% common More confidence means larger interval; so a 90% CI is smaller than a 99% CI Changes with population standard deviation More variable population means larger interval

Caution with Confidence Intervals Data should be from random sample More complicated sampling requires different methods Example - multistage or stratified sampling Outliers can cause problems Non-normal data can change confidence level Skewed data a big problem Bias not accounted for Non-responders Target and sampled population different

95% Confidence Intervals with SAS 1) Construct from output estimate +/- 1.96*SE 2) Provided automatically by some procedures PROC MEANS DATA = STUDENTS LCLM UCLM; VAR AGE;