Download presentation
Presentation is loading. Please wait.
1
Archival research: Ungraded review questions
2
Can you explain your answer?
Use your fingers to indicate your answer: 1=A, 2=B, 3=C, 4=D. For “check all that apply,” use your both hands. After viewing the question, show me your answer in 15 seconds. Next, turn to your neighbor and you have one minute to convince him/her that you are right.
3
Every year SAS Institute, the world’s largest software company in data analytics, holds a student competition in the SAS Global Forum. The contestants can use public archival data only, rather than their own data set. Why? SAS does not believe that students are capable of collecting accurate data. SAS makes it harder to limit the number of submissions. The competition aims to test the ability of analyzing big data, and usually big data are archival data.
4
In the SAS Global Forum student paper, some required components are not found in a typical academic paper. What are they? Data analysis and generalization Data source and research problem Data cleaning and visualization Programming syntax and suggestions for future study
5
Peter, Paul, and Mary downloaded the test scores of Program of International Student Assessment (PISA) and compared test performance between the US and the UK students. This is an example of: Archival research. Meta-analysis Survey research Quasi-experiment
6
Pam Anne downloaded thirty research articles about Cognitive-Behavioral Therapy (CBT) from the library. She synthesized the findings of these studies in order to obtain a global view of the effectiveness of CBT. This is an example of: Archival research. Meta-analysis Observational study Literature review
7
What is the following is not an advantage of archival research?
Save time and money in data collection. You don’t need IRB approval. Data are accurate and data cleaning is not needed. Provides a basis for comparing the local sample against the national or even worldwide sample.
8
Which of the following is a shortcoming of archival research?
Data are inconsistent if different sources are used (e.g. different organizations might define wellbeing differently) The sample size is extremely large and most statistical software packages, such as JMP, cannot handle too many observations. The data are collected at different times (e.g. WVS has six waves and PIAAC has two rounds) and comparison across time is difficult.
9
Which of the following graphing method can be utilized to study a trend-based data set?
Bubble plot GIS Map Histogram Boxplot
10
Which of the following is NOT an archival data set for studying education and skill levels?
TIMSS PISA PIAAC HPI
11
Which of the following is NOT an archival data set for studying wellbeing?
United Nations Human Development Programme Gallup Global Wellbeing Happy Planet Index Pew Research
12
Which of the following is NOT an archival data set for studying opinions and values?
EVS WVS NORC CCMH
13
Which of the following is not assessed by PISA?
Math Science Reading Technology-based problem- solving
14
Which of the following is NOT assessed by PIAAC?
Numeracy Technological proficiency Ethical values Literacy
15
When the sample size is very large (count in thousands), how can we compare groups?
T-test ANOVA Report the confidence intervals All of the above
16
Why shouldn’t we use regression analysis when there are too many predictors (e.g. 10 or more)?
The statistical power is too high and a very trivial effect might be mis-identified as significant. Multicollinearity: Predictors are strongly correlated and the result may be accurate. The least square criterion was discovered in 1805/1809 and today we have better algorithms. B and C
17
Why shouldn’t we use regression analysis when there are too many subjects (e.g. count in thousands)?
The statistical power is too high and a very trivial effect might be mis-identified as significant. Multicollinearity: Predictors are strongly correlated The least square criterion was discovered in and today we have better algorithms. A and C
18
Which of the following is/are an ensemble method(s)?
Dragging Bagging Boosting (bootstrap forest) B and C
19
In bagging the big sample is partitioned into subsets in a systematic way.
True False
20
In boosting the observations that are mis-classified (failed to predict) in the previous model will be selected again in the next model. True False
21
In the ensemble method the algorithm can rank order the importance of the predictors by vote counting, but in regression analysis it cannot. True False
22
What is/are the advantage(s) of the ensemble method?
The traditional approach is only one single analysis but the ensemble method replicate the same study and verify the result by repeated analyses. The ensemble method can be used in both small and big samples. In classical parametric procedures the data structure must meet certain assumptions (e.g. normality) but the ensemble method does not need it. All of the above
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.