Paf 203 Data Analysis and Modeling for Public Affairs

Slides:



Advertisements
Similar presentations
Brought to you by Tutorial Support Services The Math Center.
Advertisements

Psychology: A Modular Approach to Mind and Behavior, Tenth Edition, Dennis Coon Appendix Appendix: Behavioral Statistics.
Table of Contents Exit Appendix Behavioral Statistics.
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Statistics for Decision Making Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
QM Spring 2002 Statistics for Decision Making Descriptive Statistics.
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories Ordinal measurement Involves sorting objects.
Measures of Central Tendency
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
CHAPTER 1 Basic Statistics Statistics in Engineering
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
Describing Data Lesson 3. Psychology & Statistics n Goals of Psychology l Describe, predict, influence behavior & cognitive processes n Role of statistics.
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Subbulakshmi Murugappan H/P:
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.
Chapter Eight: Using Statistics to Answer Questions.
The field of statistics deals with the collection,
Statistical Analysis of Data. What is a Statistic???? Population Sample Parameter: value that describes a population Statistic: a value that describes.
1 UNIT 13: DATA ANALYSIS. 2 A. Editing, Coding and Computer Entry Editing in field i.e after completion of each interview/questionnaire. Editing again.
LIS 570 Summarising and presenting data - Univariate analysis.
Engineering Fundamentals and Problem Solving, 6e Chapter 10 Statistics.
Chapter 2 Describing and Presenting a Distribution of Scores.
REVIEW OF BASIC STATISTICAL CONCEPTS Kerstin Palombaro PT, PhD, CAPS HSED 851 PRIVITERA CHAPTERS 1-4.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
ANNOUCEMENTS 9/3/2015 – NO CLASS 11/3/2015 – LECTURE BY PROF.IR.AYOB KATIMON – 2.30 – 4 PM – DKD 5 13/3/2015 – SUBMISSION OF CHAPTER 1,2 & 3.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.
An Introduction to Statistics
Statistics & Evidence-Based Practice
Descriptive Statistics
Chapter 12 Understanding Research Results: Description and Correlation
PA330 FEB 28, 2000.
Chapter 2: Methods for Describing Data Sets
Module 6: Descriptive Statistics
CHAPTER 5 Basic Statistics
Descriptive Statistics: Presenting and Describing Data
Description of Data (Summary and Variability measures)
PROBABILITY AND STATISTICS
STATS DAY First a few review questions.
Numerical Descriptive Measures
Introduction to Statistics
Basic Statistical Terms
Numerical Descriptive Measures
10.2 Statistics Part 1.
Statistics: The Interpretation of Data
Constructing and Interpreting Visual Displays of Data
Chapter Nine: Using Statistics to Answer Questions
DESIGN OF EXPERIMENT (DOE)
Numerical Descriptive Measures
Presentation transcript:

Paf 203 Data Analysis and Modeling for Public Affairs Lecture on Statistics Paf 203 Data Analysis and Modeling for Public Affairs

Dedication of the Sanders and Smidt book: To those who open this book with dismay Quote in a university student calendar- “ If I had only one day left to live, I would live it in my statistics class… it would seem so much longer”. “ It’s not the figures themselves, it’s what you do with them that matters”.

Learning points in our study of statistics: What is statistics? What is descriptive versus inferential statistics? What is a mean, median, mode? What is a standard deviation? Why is frequency distribution important in the data presentation and analysis? How do we present data in a histogram, pie chart, pictogram, bar charts? What is the difference between a population and a sample? What is a parameter? A statistic? When do we use a t-statistic versus a chi-square? Why do we need to know about regression analysis?

What is statistics? Statistics is the science of designing studies, gathering data, and then classifying, summarizing, interpreting, and presenting these data to explain and support the decisions that are reached. Population- is the complete collection of measurements, objects, or individuals under study. A sample- is a portion or subset taken from the population. A parameter is a number that describes a population characteristic. A statistic is a number that describes a sample characteristic.

What is descriptive versus inferential statistics? Descriptive statistics includes the procedures for collecting, classifying, summarizing, and presenting data. Charts, tables, and summary measures such as averages are used to describe the basic structure of the study subject. Inferential statistics is the process of arriving at a conclusion about a population parameter (which is usually an unknown quantity) on the basis of information obtained from a sample statistic (a known value).

Why do we want to know about statistics??? You need a knowledge of statistics to help you: describe and understand numerical relationships and to make better decisions.

Example: Describing Relationships between Variables A college admissions officer needs to find an effective way of selecting student applicants. He/she designs a statistical study to see if there is a significant relationship between UPCAT scores and the grade point average achieved by freshmen at the school. If there is a strong relationship, high UPCAT scores will become an important criterion for acceptance. A public health official decides to see if there is any connection between inhaling the smoke produced by cigarette smokers and the incidence of asthma in young children. She applies statistical techniques to large amounts of data and reaches conclusions that will affect the health of large numbers of people.

Example: Aiding in Decision Making A personnel manager has noted that job applicants who score high on a manual dexterity test later tend to perform well in the assembling of a product, while those with low test scores tend to be less productive. By applying statistical techniques known as the regression analysis, the manager can forecast how productive a new applicant will be on the job on the basis of how well he or she performs on the test.

Statistical Solving Problem Methodology Identifying the problem or opportunity. Deciding the method of data collection. Collecting the data. Classifying and summarizing the data. Presenting and analyzing the data. Making the decision.

Descriptive Statistics The following array of data characterizes the ISPPS staff at the UPLB for the year 2004. Let’s use this data series to learn about descriptive statistics.

Each picture represents 2 persons To present the data in the pictogram, we use symbols to represent a unit of measurement for each of the classification that we want to show. For example, to present a pictogram of the classification of ISPPS employees (faculty, REPS, administrative staff), we use the following: Pictogram of Employee classification of ISPPS staff REPS Faculty Admin Each picture represents 2 persons

Pictogram of Educational attainment of ISPPS staff PhD MS BS HS Each picture represents one person

We can also present data in terms of bar graphs We can also present data in terms of bar graphs. There are two types of bar graphs: the vertical and the horizontal bar graphs. Vertical bar graphs

Horizontal bar graphs

We can also use pie chart to present our data We can also use pie chart to present our data. To derive for the figures to be used in the pie chart, we first get the proportion each of the class to the total, and then draw a pie chart, as follows:

Measures of Central Tendency: Central tendency means in lay man’s terms an average. But there are several ways of computing for this average. The three most common are the following: mean, median, and the mode.

where n = number of samples Mean The mean is the sum of the scores divided by the number of items. For example, if we have an array as follows: X: 0,5,3,9,8 The arithmetic mean would be: 0+5+3+9+8/5=5. The formula for this is: where n = number of samples Xi = age, i= 1…n Mean age of ISPPS staff : 47

Median The median is the point that divides the array such that 50% of the cases fall below it and 50% fall above it. Example: Given an array: 0,5,3,9,8, the median is the middle of the value of the array; after the numbers have been ordered from lowest to highest or highest to lowest: 0,3,5,9,8- the median of this distribution is 5.

(cont.) Median 0,3,5,8,9,12, where there will be no middle value, the median is the average of the two middlemost values: 0,3,[5,8],9,12, hence the median will be the average of 5 and 8 which would be 6.5. Now, what is the median age of the staff of the ISPPS?

Mode The mode is the most frequent value in the distribution. Since it is the most frequent value, it dispenses with the idea of a point of balance. In the following array, what is the mode? 1,2,5,1,3,5,1,9,1? The mode is 1 because it is the most frequent value in the distribution

(cont.) Mode In the following array, what is the mode? 1,2,5,1,3,5,1,9,2,5,2? The mode is 1 and 2 because there are actual numbers of 1 and 2. What is the mode of the age distribution of the ISPPS staff?

Showing variability: It may also be useful to show, apart from typicality, variability within a group. This is done by computing for one or more measures of variability, or measures of spread or measures of dispersion. There are different measures of variability, some of which are the following: the range, the average deviation and the standard deviation.

Range It is the difference between the highest value and the lowest value in the array. Again, given the array: 0,3,5,9,8, the range is 9 (9-0). The more common way of expressing the range would be to cite the figures that have the highest and the lowest value. In the above example, the range would be R=[0,9].

Average Deviation or the Mean Absolute Deviation The average deviation gives you a sense of how far away the individual values from the mean. It is not a commonly used measure for showing variability but will facilitate our learning of the very important measure, the standard deviation. It is the numeric difference of each item from the mean without regard to the algebraic sign. It is represented by the following formula:

Standard deviation The most common measure of dispersion is the standard deviation. The standard deviation is the square root of the average of the squared deviations from the mean. Standard deviation of age: 8.14 Standard deviation of income : 6,174 standard deviation

Frequency Distribution We can summarize data using an interval scale. Decisions need to be made on how many categories will be used and where to establish cut-off points. There are no simple rules for doing this. A lot of the decision will depend on the purposes to be served by the classification. There are some guidelines that can be followed in constructing frequency distributions. If the data is given in whole numbers, then the end limits or what we call the class limits should be in whole numbers. If these are given to one decimal point, then the class limits should be to one decimal point. In other words, our class limits should follow the number of decimal points that the data follow.

(cont.) Frequency Distribution The size or the width of the interval should be some convenient number. Convenient numbers would be like 1, 5, 10, 20, 25, 50, 100. The class limits should also be a convenient number. It makes no sense to have a class limit like 8.4-13.8. Avoid intervals so narrow that some categories have zero observations. As much as possible, use equal size intervals As much as possible, use closed intervals. You may use open intervals only when closed intervals would result in class frequencies of zero.

Table 2. Frequency distribution of monthly incomes of ISPPS staff, 2004, College, Laguna

Histogram In a histogram, a bar can be used to represent each category. The height of the bar indicates its size. If the scale is nominal, the actual ordering of bars will not matter. For ordinal or interval scales, the bars are to be arranged in their proper order, giving a good visual indication of the frequency distribution.

(cont.) Histogram Table 3. Frequency distribution of age of ISPPS staff, 2004, College, Laguna

(cont.) Histogram