Elementary Statistics as Basic Research Tools Md. Ashraful Islam Khan, PhD Department of Population Science and Human Resource Development, Rajshahi University. 15, July, 2018
Statistics Descriptive statistics Consists of methods for organizing and summarizing information. Descriptive statistics includes the construction of graphs, charts, and tables and the calculation of various descriptive measures such as averages, measures of variation, and percentiles. Inferencial Statistica The 1948 Presidential Election In the fall of 1948, President Truman was concerned about statistics. But the statisticians had predicted incorrectly that the Republican nominee, Thomas E. Dewey would win. Truman won more than 49% of the vote and, with it, the presidency.
SAMPLING STUDY POPULATION SAMPLE TARGET POPULATION
Quantitative Versus Qualitative Approaches Quantitative approaches: measuring and counting phenomena + Precise, controlled, claims on causation/prediction - Oversimplification, no recognition of subjectivity/ individuality Qualitative approaches: capturing qualities of phenomena + Recognition of subjectivity/individuality, openness - Difficult to establish internal and external validity, difficult to separate data analysis and interpretation
Commonly Used Qualitative Analysis Frequency Proportion / Rate Rank Association Contingency Analysis In social science study it’s a very common practice to convert quantitative data to qualitative data. Advantage: Ease in collecting data Disadvantage: Information loss, Misleading inference
Commonly Used Quantitative Analysis Frequency Central measures Dispersion measures Moments Correlation Advantage: Have forecasting ability Disadvantage: Sometimes model dependent
Simpson’s Paradox The University of California at Berkley was charged with having discriminated against women in their graduate admissions process for the fall quarter of 1973. Source: Bickel, Hammel, and O’Connel (1975) Percentage of Acceptance: Men: 44.49 Women: 25.17 The equality of two proportions test yields a z-value of 7.73 (p-value = 0.000) Fisher’s exact test also yields a p-value = 0.000 Men Women Accepted 533 113 Denied 665 336 Total 1198 449
Percentage of Acceptance What a contradiction!!! Lurking Variable: A missing variable that contains very important and relevant information. In our example the lurking variable is ‘the program applied to’. Further analysis will show that women attempted to enroll in a very higher rate (75.95% in comparison with men’s rate 31.47%) to a program which is much harder (having the acceptance rate only 6.89% while the other program has the acceptance rate 64.30%). Men Women Accepted Denied Program A 511 314 89 19 Program F 22 351 24 317 Total 533 665 113 336 Program A Program F Men: 61.94 Men: 5.90 Women: 82.40 Women: 7.04
Steps in Empirical Research Define general question of interest Draw on existing knowledge to develop specific questions/hypotheses to be studied Determine design of study Data collection Data analysis Interpretation/conclusions regarding initial questions/hypotheses
Important Components of Empirical Research Problem statement, research questions, purposes, benefits Theory, assumptions, background literature Variables and hypotheses Operational definitions and measurement Research design and methodology Instrumentation, sampling Data analysis Conclusions, interpretations, recommendations PROBLEM STATEMENT, PURPOSES, BENEFITS What exactly do I want to find out? What is a researchable problem? What are the obstacles in terms of knowledge, data availability, time, or resources? Do the benefits outweigh the costs? THEORY, ASSUMPTIONS, BACKGROUND LITERATURE What does the relevant literature in the field indicate about this problem? Which theory or conceptual framework does the work fit within? What are the criticisms of this approach, or how does it constrain the research process? What do I know for certain about this area? What is the background to the problem that needs to be made available in reporting the work? VARIABLES AND HYPOTHESES What will I take as given in the environment ie what is the starting point? Which are the independent and which are the dependent variables? Are there control variables? Is the hypothesis specific enough to be researchable yet still meaningful? How certain am I of the relationship(s) between variables? OPERATIONAL DEFINITIONS AND MEASUREMENT Does the problem need scoping/simplifying to make it achievable? What and how will the variables be measured? What degree of error in the findings is tolerable? Is the approach defendable? RESEARCH DESIGN AND METHODOLOGY What is my overall strategy for doing this research? Will this design permit me to answer the research question? What constraints will the approach place on the work? INSTRUMENTATION/SAMPLING How will I get the data I need to test my hypothesis? What tools or devices will I use to make or record observations? Are valid and reliable instruments available, or must I construct my own? How will I choose the sample? Am I interested in representativeness? If so, of whom or what, and with what degree of accuracy or level of confidence? DATA ANALYSIS What combinations of analytical and statistical process will be applied to the data? Which of these will allow me to accept or reject my hypotheses? Do the findings show numerical differences, and are those differences important? CONCLUSIONS, INTERPRETATIONS, RECOMMENDATIONS Was my initial hypothesis supported? What if my findings are negative? What are the implications of my findings for the theory base, for the background assumptions, or relevant literature? What recommendations result from the work? What suggestions can I make for further research on this topic?
Important statistical terms Population: a set which includes all measurements of interest to the researcher (The collection of all responses, measurements, or counts that are of interest) Sample: A subset of the population
Picture of sampling breakdown
Why sample? In reality there is simply not enough; time, energy, money, labour/man power, equipment, access to suitable sites to measure every single item or site within the parent population or whole sampling frame. Therefore an appropriate sampling strategy is adopted to obtain a representative, and statistically valid sample of the whole.
SAMPLING STUDY POPULATION SAMPLE TARGET POPULATION
Target Population: The population to be studied/ to which the investigator wants to generalize his results Sampling Unit: Smallest unit from which sample can be selected Sampling frame List of all the sampling units from which sample is drawn Sampling scheme Method of selecting sampling units from sampling frame
Why Is Random Sampling Important? The myth: "A random sample will be representative of the population". A slightly better explanation that is partly true but partly urban legend : "Random sampling eliminates bias by giving all individuals an equal chance to be chosen.“ The real reason: The mathematical theorems which justify most frequents statistical procedures apply only to random samples.
Thank You