Statistics and Variables Statistics and Data Statistics: numbers that summarize information quantitatively. How many hours American watch TV per day on the average? Statistics: the methods used to calculate summary numbers and to generalize from them. A more accurate account of what we are going to do with statistics: to quantitatively summarizing and generalizing information.
Statistics and Variables 2 Data are records of observations. Most sociological data are survey research data. General social survey, for example, is a large survey of adult Americans annually or biannually since 1972.
Statistics Descriptive statistics: methods for summarizing information so that the information is more intelligible, more descriptive, or can be communicated more effectively. Methods include average (mean, mode, and median) or graph (bar, chart, line etc).
Statistics 2 Inferential statistics concerns how well a dataset (a sample) can generalize to the entire population, from which the data are drawn. For example, general social survey (GSS) interviewed about 1,500 to 2,500 American adults. But GSS is not interested in obtaining information of those 1,500 to 2,500 Americans. It is more interested in generalizing those information onto the entire US population.
Samples and Populations Researchers normally use sample to infer information for the population, from which the sample were drawn, because survey on everybody in the entire population is almost impossible. Random sampling methods are used to ensure cases are randomly selected. National Opinion Research Center (NORC) conducted GSS using very sophisticated random sampling method to produce the best social science research data.
Variables A variable is some characteristics or properties that differ in value from one case to another. Education, income, number of children, age, human races. Measurement: the process of finding the values of a variable for different cases.
Levels of measurement Level of measurement describes how much information is conveyed by differences between values of variables Nominal variable is measured that its values or attributes are different from one another. Such a difference, however, cannot be ranked/ordered, nor can it be captured by numbers. Human race, color, region, religion.
Levels of measurement 2 Ordinal variable is one whose values can be ranked. Social class, occupations, etc. But their difference cannot be captured by numbers. Interval variable can be ranked and measured using standardized units such as years, degrees, inches. Ratio variable is pretty much the same as the interval variables with only one exception, ration variable has nonarbitary 0 point. Such as income, or education. The difference between interval and ratio variable is so trivial, most of time we just call interval/ratio variables. In some cases, researchers call interval/ratio variables continuous variable.
Mutually exclusive and collectively exhaustive Mutually exclusive, values do not overlap. To measure education, a respondent can be highest schooling as primary school, middle school, high school, college, vocational school, masters, and PhDs. Collectively exhaustive: a set of values includes all cases. To measure religion, the survey asked respondents whether they are protestant or catholic. What if they are Buddhism?
A survey Gender Race Age Year in the university What you study? Is this class required or elective? What you want to learn from this class? If this class is elective, what motivates you to take this class? How much you know about China (from 0 to 10: 0 means nothings; 10 means everything)
Level of analysis Individual level data, gender, education, income, race, religion, etc. Aggregate data of regional or state level, murder rate, nonwhite population percentage, woman employee percentage etc.
Independent/dependent variables Variables include independent variables, sometimes also called exogenous variables or predictors, and dependent variables. Independent variables are antecedent, causal factors, whereas dependent variables are the consequences of the independent variables. Most variables can be both independent and dependent variables Education -> income Income -> housing
Propositions/hypotheses Formal propositions/hypotheses are the core for social research. They are statements of the relationship between social variables. The higher the education, the higher the income The more central the workplace, the lower the workers’ commitment
Exercise 1 Identify variables 1.Stanford university 2.The U.S. president’s popularity 3.SAT score 4.Elvis Presley’s age 5.Support for legalization of marijuana 6.American’s attitudes toward Iraq 7.Burglary rates of the 50 states
Exercise 2 Identify variables’ levels of measurement 1.Dwelling type 2.Number of children in a family 3.Owning a gun? 4.Number of hours watching TV 5.Percent of state’s population living in urban areas 6.occupation
Exercise 3 State two hypotheses. For each hypothesis, use the given variable as an independent variable in the first hypothesis, and as a dependent variable in the second hypothesis. a. Variable: political party (democrat, independent, republican) First hypothesis: Second hypothesis:
Exercise 3 continues Marital happiness 1.first hypothesis: 2.second hypothesis: Success of college football team 1.First hypothesis 2.Second hypothesis