Introduction to quantitative methods Dr Siddhi Pittayachawan BEng, GCTTL, MEng, PhD, MACS CP, MSCLAA School of Business IT and Logistics RMIT University, Australia Lectured at Mae Fah Luang University on 28–29th Sep 2016
28th Sep
Who am I?
I am ... Senior Lecturer of Information Systems and Supply Chain Management, School of Business IT and Logistics, College of Business, RMIT University OUA Program Coordinator, Bachelor of Business (Logistics and Supply Chain Management) Teaching: Used to: research courses for UG, PG, and HDR across business disciplines (6 years), data analysis (6 years) Now: operations research (2 years)
Research foci Information system adoption Social media Personal cloud Information security behaviour Sustainable consumption Omni-channel retailling Business education Statistical modelling
Websites RMIT staff page Personal site LinkedIn Academia.edu (for my publications & talks) Google Scholar Citations (for my bibliometrics)
Introduction to Quantitative Methods
What is quantitative research? A research that uses numerical data to represent and explain a phenomenon. It may (not) involve hypothesis testing. Data is collected via coding, observing, asking, and manipulating a subject. Data can be qualitative and quantitative in nature. Analysis involves statistics, probability, and mathematics.
What can quantitative research do? Describe, and visualise, phenomena (but not beyond a sample) Assist decision making process (input–output) Explore data to formulate new hypotheses (knowledge discovery) Develop/test measurements (balance scorecards, tests, matrices, and questionnaires) Classify/measure subjects/variables Develop models explaining complex phenomena Test hypotheses, especially causality Simulate and predict future under uncertain conditions Produce predictive formulae
Should I use hypotheses? No Yes You have no clue (or can’t guess) what is happening in the reality. You want to produce results that are applicable to only a sample that you have (descriptive statistics). You want to create a new hypotheses. You assume that formulae rest on certainty (mathematical studies). You already know (or guess) what is happening in the reality based on theories or previous research. You want to produce a result that can be generalised to another sample in the same population (inferential statistics). You want to test/reject, an extant hypothesis. You assume that formulae rest on uncertainty (statistical studies).
Hypothesis testing approaches Frequentist: p(E|H) Bayesian: p(H|E) Given a hypothesis, what is the probability of evidence? Confidence intervals (confidence rests on estimation methods) Objective Ignore a priori knowledge Counter-intuitive (test competitive hypotheses) Given evidence, what is the probability of hypothesis? Credible intervals (confidence rests on data) Subjective Incorporate a priori knowledge Intuitive (test your own hypotheses) If a priori knowledge has little effect on an outcome, regardless of what kind of a priori knowledge, an outcome should be the same as that of Frequentist’s.
Which design should I use? Retrospective Prospective Reality is static. You are not interested in changes over time. You do not have enough time. Reality is dynamic. You are interested in changes/stability over time. You have plenty of time.
What methods are there? Secondary analysis: Use extant data Content analysis: Code documents Structured observation: Observe behaviours Survey: Ask questions Q methodology: Sort cards Social network analysis: Observe or ask about relationships Experiment: Manipulate factors Meta analysis: Synthesise results Simulation: Predict results
Commonality between methods The common mechanism among quantitative methods is the use of a standardised measurement to collect data, except secondary analysis, meta analysis, and simulation. To develop a standardised measurement, you must have a measurement model.
Standardised measurement Content analysis: Coding manual Structured observation: Coding manual Survey: Questionnaire Q methodology: Q sets and sorting distribution Social network analysis: Questionnaire Experiment: Questionnaire
How can I know what data to be collected? To know what data to be collected in your research, you need to create a measurement model. It is the link among a theory, your conceptual model/framework/hypothesis and data. If your measurement is theoretically insensible, validity of your data is questionable, and you can not link data back to the theory. A measurement model leads you to develop a measurement for data collection, which is, for example, a questionnaire in survey analysis and a coding book in content analysis. If you plan to use SEM, a measurement model is extremely important.
What data types can be collected? Attitudes: knowledge, feelings, and actions Images: dimensions, profiles, and comparisons Decisions: information sources and evaluative criteria Needs: needs, desires, preferences, motives, and goals Behaviours: actions, locations, persistency, proportion Networks: evaluation, transaction, association, interaction, movement, physical connection, formal relation, biological relationship Lifestyles: activities, interests, opinions, and possessions Affiliations: normative, comparative, informative Demographics: age, sex, status, education, employment, occupation, income, experience, location, health, personality, culture, economy Themes: words, phases, sentences, paragraphs, meanings, materials
Data Sources Population Population is a group of subjects concerned by your research issues and is a source of data to be collected. Ideally, it is a sampling frame which is a list of samples that allows you to identify and contact a sample. Sample is a group of subjects selected, or is accessible, to be used in your research. It is important because: Cost & time saving Minimise sampling and non-response error Does not saturate the population for future research Unit of analysis is a case of subjects which can be defined as an individual, a pair, an organisation, and a region. Sample 1 Sample 2 Sample 3
How can I select a sample to collect data? Probability Sampling Non-probability Sampling Simple random Systematic Stratified Cluster Convenience Snowball Quota Self-selection Purposive
Probability sampling Simple random: all cases in the population have an equal chance to be selected Systematic: sample is selected at every specific interval, e.g. every 9th case Stratified: the population is divided based on attributes, e.g. the ratio of male and female employees is 2:3, then cases are random selected from each stratum to replicate the same ratio as that in the population Cluster: the population is divided based on a natural boundary, e.g. geography, then cases are random selected from each cluster
Non-probability sampling Convenience: select cases that can easily provide data Snowball: contact a few cases to collect data, then ask them to send research information to others who are likely to be eligible, and so on. This technique is often used when it is difficult to identify and contact sample Quota: the logic is the same as that of stratified sampling except that cases are not randomly selected Self-selection: publicise the study allowing those who are interested to take part, e.g. online survey Purposive: select cases based on judgement of researchers that they are suitable for studies
Sampling methods Probability Non-probability Pros: Cons: Pros: Cons: Generalisability is convincing Easy to select sample Comply assumption of many statistical techniques Cons: Difficult to obtain a sampling frame Pros: Generalisability is questionable Easy to reach sample Cons: Difficult to select sample Potential coverage error Violate assumption of many statistical techniques
Sampling modes Mode: In-person Telephone Mail Internet Those who respond a survey may have a specific characteristic, thereby causing common method bias in data and introducing coverage error. This issue can be minimised by using mixed-mode survey (Dillman et al. 2009).
What type of sample can I collect? Individuals Pairs Husbands & wives Buyers & sellers Groups Communities Departments Clusters Regions Countries Stratum Males & females
Sample vs Unit of analysis What would you do if each sample in your data is an individual but your unit of analysis is an organisation? One possible solution is to use multi-level models. Another solution is to aggregate data from one level to another.
How many subjects should I have? Known Finite 𝑛= 𝜎 2 𝑒 𝑧 2 + 𝜎 2 𝑁 Infinite 𝑛= 𝑧𝜎 𝑒 2 Unknown 𝑛= 𝑝𝑞 𝑒 𝑧 2 + 𝑝𝑞 𝑁 𝑛=𝑝𝑞 𝑧 𝑒 2 Sample size (n) Standard deviation (σ) Population (N) Formula z = z score of confidence level (e.g. 1.96) e = percentage of error margin (e.g. 0.05) p = probability of correct selection (e.g. 0.5) Weiers (2008, pp. 291–301)
How many subjects should I have? z-score is the point on the normal distribution based on the α value that you select (e.g. 1.96 for 5%). e is the error margin, which is a half of the width of a confidence interval. For example, if a variable is a percentage and the estimated value is 50%, 5% margin of error means that, based on 95% confidence interval, the true value lies between 45%–55%. Basically, if you want a confidence interval to be narrower, you must decrease the error margin.
About σ If σ is unknown but you know the range (i.e. the distance between the minimum and maximum values), you can use the following formula: 𝜎= 𝑚𝑎𝑥−𝑚𝑖𝑛 4 Ref: Al-Saleh & Yousif (2009)
Sample size Standard deviation can be identified from literature, experts, or your rationale. The simplest method to calculate is to divide the range (i.e. max-min) by 6. The previous slide shows the formulae used to determine the sample size when you have a single parameter (i.e. single standard deviation) in a model. However, it becomes more difficult to determine the sample size when your model has multiple parameters (i.e. multiple standard deviation). Also, the formulae only considers sampling theory to ensure that you have enough sample to represent the whole population. However, it does not account for effect size of a parameter to be detected in a model.
Let us recap Nuts and bolts in quantitative research contain a number of concepts that you must be familiar with (more to come!). A quantitative study requires a careful design allowing you to gather evidence to answer your research questions in a rigorous process.
Logic of Hypothesis Testing
What is a hypothesis? “Many scientists define hypotheses as empirically testable statements that are derived from theories and that form a basis for rejecting or not rejecting those theories, depending on the results of empirical testing.” (Jaccard & Jacoby 2010, p. 29)
Nature of scientific practice Since we cannot prove that the theory (i.e. alternative hypothesis) is valid simply because our observation complies the theory, we measure the probability of how unusual our observation is against another theory that says otherwise (i.e. null hypothesis).
How to test a hypothesis? Set up a null hypothesis (H0) that you plan to test and an alternative hypothesis (Ha) that can be used to explain your idea Decide about α, β, and n before collecting data Report the exact p-value If the p-value falls into the rejection region of H0, reject H0; otherwise do not reject H0. Scientifically, we never accept H0 but simply say that we fail to reject H0. Discuss an alternative scenario, for example: When Ha is accepted or H0 is not rejected, it may be due to chance (i.e. low statistical power), coverage error, bad research process (e.g. the boundary of the study was not controlled well), invalid measurement, unsuitable analysis, or bad theory (i.e. you find something new!). The procedure above is a combined procedure of Fisher’s and Neyman–Pearson’s (Gigerenzer 2004)
Probability of rejecting a hypothesis Ensure that it is theoretical sensible otherwise you will be at risk of producing non-sensible contributions Waller (2004) demonstrates that we have a 46% chance to reject random directional null hypotheses. In case of random non-directional null hypotheses, the chance would be 92%.
So we have two worlds Alternative world Null world Population Sample Testable Static Known distribution Known parameter Alternative world Population Not testable Dynamic Unknown distribution Unknown parameter
α & β Null world Alternative world β: Probability of failing to reject H0 when H0 is false. It is known as Type II error or false negative. The rule of thumb is 20% and can be controlled by sample size. α: Probability of falsely rejecting H0 when H0 is true. It is known as Type I error or false positive. The rule of thumb is 5% and can be controlled by your decision. Null world Alternative world 1-α: Probability of accepting H0 when H0 is true. 1-β: Probability of detecting Ha when Ha is true. It is known as statistical power.
Outcomes of hypothesis testing Reality H0 is true H0 is false Outcome H0 is not rejected Correct Outcome [1-α] False Negative (Type II Error) [β] H0 is rejected False Positive (Type I Error) [α] [1-β]
Fire & Sprinkler System This can be tested using power analysis. This can be tested using a significance test (i.e. those statistical tests that generate a p-value). Reality There is no fire (H0 is true) There is fire (H0 is false) Outcome System is not activated (H0 is not rejected) 95% [1-α] 20% (Type II Error) [β] System is activated (H0 is rejected) 5% (Type I Error) [α] 80% [1-β]
α & β When α is 0.05 and β is 0.2, the relative seriousness of Type I and Type II errors is 0.2/0.05 = 4, meaning that falsely rejecting the null hypothesis is considered 4 times as serious as mistakenly accepting the null hypothesis. Type I error is 4 times as serious as Type II error. In the fire & sprinkler example, having a system activated when there is no fire is 4 times as serious as having a system inactivated when there is fire. These values are normally used in social sciences. They are, however, arbitrary and can be changed based on contexts. Reference: Cohen (1988)
Effect size Statistical techniques often provide two types of information: Significance value (p-value) is the probability of how unusual our observation is, assuming that the null hypothesis is true. It represents statistical significance. Effect size is a measure of strength of the relationship between 2 variables in the population, assuming that the alternative hypothesis is true. It represents practical significance. Different statistical techniques produce different types of effect sizes.
Effect size A measure of a relationship strength between 2 variables in the population A degree to which the phenomenon is present in the population A degree to which the null hypothesis is false An effect size comes in many forms such as B and r.
Importance of effect size Scenario 1: Before data collection: when you know the effect size of a particular relationship and decide on α and the number of observations (n), you can calculate power (1-β) that you will get. After data collection: after you estimated the effect size of a particular relationship, you can use α and n to calculate power (1-β). If it is very low, the current result may be capitalised by chance because the probability of detecting the effect is too small.
Importance of effect size Scenario 2: Before data collection: when you decide on α, power (1-β), and the effect size, you can calculate n to be collected. Scenario 3: Before data collection: when you decide on α, power (1-β), and n, you can calculate a minimum detectable effect size. Scenario 4: Before data collection: when you decide on power (1-β), n, and the effect size, you can calculate α. Ref: Cohen (1988, pp. 14–16)
Importance of effect size The techniques of calculating statistical power is called power analysis. It can be used to fill in an ethic application on the sample size section.
If we decrease a sample size (n) …
If we decrease α … β,n,ES α
If we decrease β … α,n,ES β
If we decrease ES … α,β,n ES
Four-way tug of war n α β ES
Effect size Let’s assume that we conduct correlation analysis and the result is r = 0.2 at p = 0.01. The result leads us to reject the null hypothesis and to conclude that there is a relationship between two variables (e.g. advertisement and sales) at 5% significance level. However, a manager may decide not to take any action on the basis that the effect size is small, meaning that there is little practical significance. It means that statistical significance does not mean practical significance.
Effect size & statistical techniques t-test d Analysis of variance None, but 𝜂 2 and 𝜔 2 can be manually calculated General linear model Partial 𝜂 2 Correlation Pearson’s r Regression Pearson’s r (individual parameter) 𝑅 2 can be converted into 𝑓 2 (model) Logistic regression Odd ratio (individual parameter) Factor analysis 𝜆 Structural equation modeling Standardised 𝛾 and 𝛽 𝑅 2
Sample size & Effect size When you know what effect size (e.g. r=.5) you expect to detect, you can calculate the minimum sample size that you need to detect the effect. Conversely, when you have a specific sample size, you can calculate the maximum effect size that you can detect. This process is called power analysis.
Power analysis To conduct power analysis, you need to decide the value of α and β. The rule of thumb is 5% and 20%. These values must be decided before collecting data to test a hypothesis. For simple analysis, use G*Power. For complex analysis, use Monte Carlo study.
Why bother with statistical power? When p < 0.05 and statistical power > 0.80, we minimise a chance to commit both Type I and II errors. Since statistics does not prove whether null or alternative hypothesis is true, we need reliable and valid results given both scenarios to make a decision. What do you think if you heard from another person that: Given H0, the result is extremely unusual with the probability of 0.1%, thereby rejecting H0. However, given Ha, the probability of detecting the effect is 0%. Note: When results from studies having statistical power of 80%, it means that there is a 64% (i.e. 80%×80%) chance of replicating the results in a series of consecutive 2 studies. As a result, the lower the statistical power is, the less generalisability the results are. Note: The concept of generalisability also depends on the quality of your sampling processes. If it was badly carried out, results would not be trustworthy even if statistical power is higher than 80%.
Power analysis (G*Power 3)
Activity: Demonstrative learning (15min) Demonstrate correlation analysis and power analysis with dummy data
Let us recap Determining the minimum sample size is a complex issue. You need to determine from 2 aspects: The minimum number of representatives based on the number of subjects in the population or the sampling frame. Power analysis based on α, β, effect size, and statistical techniques that you plan to use. The commonly used values of α and β are 5% and 20% respectively although these values are arbitrary. Some people use the rule of thumb to determine the minimum sample size, but those are stemmed from power analysis studies and are not reliable since they are not tailored for any specific study.
Correlation & Causation
What kind of claim can you make? Observational change (correlation) Manipulative change (causation) What would be the difference in the expected value of 𝑌 if we were to observe 𝑋 at level 𝑥+1 instead of level 𝑥? 𝜀 is the deviation of 𝑌 from its conditional expectation. The equality sign is symmetrical. Independent variables are normally correlated. What would be the change in the expected value of 𝑌 if we were to intervene and change the value of 𝑋 from 𝑥 to 𝑥+1? 𝜀 is the deviation of 𝑌 from its controlled expectation. The equality sign is asymmetrical. Independent variables must be uncorrelated. Ref: Pearl (2009, pp. 157–163)
Characteristics of causal relationships Definite numbers of causes and effects Artificial isolation (controlled environments, conditions, and samples) Continuity of action Ref: Bunge (2009)
Universe & Causality IN OUT Eye-hand model Hand-eye model René Descartes’ drawing
Universe & Causality When the whole universe is considered, there is no causality. Causality requires: IN: the focus of the research OUT: the background or the boundary conditions of the research Specification of IN and OUT creates asymmetry in how we perceive. In order to maintain the boundary of the research, intervention (i.e. a controlled study) is required. If there is no intervention, we have no clue whether the change in our observation is due to things inside or outside the model.
How can I control my study? If you conduct a non-experimental study (e.g. survey), use random sampling (i.e. probability sampling techniques). If you conduct an experimental study, use randomisation. Both random sampling and randomisation allow you to even out effect of variables outside your study to improve accuracy of your result. If you do not do this, results may be due to how you select a sample.
How can I control my study? Confounding variable Cancelled out by randomisation/random sampling Model Controlled study
What is a theory? A theory is a set of statements about relationships among 2+ concepts/constructs. According to Shaw and Costanzo (1982), the characteristics of good theories which enable theories to be accepted by the scientific community are: Logically consistent Share agreement with data and facts Testable Theories that cannot be tested are metaphysical. Theories that share agreement with data and facts and are testable do not mean that they are always supported by the data. Conversely, they should be testable to demonstrate when they will fail so we have a clear understanding what is the boundary of the theories.
Reality—Concept—Communication Reality appears complex, dynamic, unique, and obscure. Concepts are generalised abstractions, encompass universes of possibilities, are hypothetical, are learned, are socially shared, are reality oriented, and are selective constructions. Communication is a way for us to transfer what one conceives about reality to another via symbols. If the symbols lead to the original concepts plus other concepts, communication is ambiguous. If the symbols lead to different concepts altogether, communication is inaccurate.
Reality—Concept—Communication System One Armenian Ա Assamese ১ Chinese 一 Devanagari १ Eastern Arabic ١ Greek αʹ Hebrew א Hindu–Arabic 1 Malayalam ൧ Roman I Tamil ௧ Telugu ౧ Thai ๑ Reality Symbol Concept
What types of variables can I hypothesise? Manifest variables Latent variables They are real, observable. They are data that you collect. Symbols: rectangle or square Analysis: regression They are hypothetical, unobservable. They are constructs, factors, residuals, and errors in your model. Symbols: eclipse or circle Analysis: SEM
Types of indicators Reflective indicators Formative indicators Reactive indicators F X1 X2 X3 X4 e1 e2 e3 e4 F F X1 X2 X3 X4 e1 e2 e3 e4 X1 X2 X3 X4 Ref: Hayduk et al. (2007)
Measurement model A map that guides researchers to collect data about variables according to research design based on indicators identified in literature in order to validate hypotheses, models, or frameworks with appropriate analysis. Measurement model Research design Research method Sampling method Sample size Analysis Literature Finding Indicator Measurement Theory Creativity Concept Hypothesis Model Framework
Theories → Measurement model SEC1 Authentication Trustworthiness SEC2 Server Security SEC3 Transaction TRU1 Seller SEC4 Digital certificate TRU2 Trust Environment TRU3 DES1 System Professionalism TRU4 DES2 Website Ease-of-use Design DES3 Error free DES4 Information credibility
Measurement model → Measurement Item Context Question Reference TRU1 Seller I trust sellers. Einstein (1890) TRU2 Environment I trust environment in online shopping. TRU3 System I trust systems used in online shopping. TRU4 Website I trust e-commerce websites. SEC1 Authentication I prefer to shop with e-vendors who use reliable authentication systems. Newton (1675) SEC2 Server I prefer to shop with e-vendors who have secure servers to keep my information confidentially. Pearson (1911) SEC3 Transaction I prefer to shop with e-vendors who support secure transactions. Galton (1857) SEC4 Digital certificate I prefer to shop with websites which have valid digital certificates. Locke (1613) DES1 Professionalism I prefer to shop with websites that are professionally designed. Peirce (1899) DES2 Ease-of-use I prefer to shop with websites that are easy to use. Kant (1650) DES3 Error free I prefer to shop with websites that do not have any broken link. Fisher (1921) DES4 Information credibility I prefer to shop with websites that contains accurate, up-to-date information about products. Luhman (1964)
Measurement → Instrument Questionnaire Please indicate to what extent you agree or disagree with the following statement (1=strongly disagree, 5=strongly agree): Statement 1 2 3 4 5 I trust sellers. I trust environment in online shopping. I trust systems used in online shopping. I trust e-commerce websites. I prefer to shop with e-vendors who use reliable authentication systems. I prefer to shop with e-vendors who have secure servers to keep my information confidentially. I prefer to shop with e-vendors who support secure transactions. I prefer to shop with websites which have valid digital certificates. I prefer to shop with websites that are professionally designed. I prefer to shop with websites that are easy to use. I prefer to shop with websites that do not have any broken link. I prefer to shop with websites that contains accurate, up-to-date information about products.
Types of relationships Direct causal relationship Indirect causal relationship A B A C B Jaccard & Jacoby (2010, p. 142)
Types of relationships Spurious relationship Bidirectional causal relationship B A C A B
Types of relationships Unanalysed relationship (correlation) Moderation effect A B A B C
Levels of hypothesis specification Description No hypothesis Relation A correlates with B. Direction A positively/negatively correlates with B. Causation A positively/negatively affects B. Invariation A positively/negatively affects B at the magnitude of 0.5. Informative hypothesis
What analysis should I use? It depends on: Research question Hypothesis Model complexity Measurement Frequentist/Bayesian Sampling/randomisation technique Data structure & distribution Data accessibility Unit of analysis
Example of analysis Aim: You have a data set. You have no clue what is happening in the data and want to explore relationships among variables. Analysis: data mining, knowledge discovery Aim: You investigate social behaviours. You want to see how humans interact in the society. Analysis: social network analysis
Example of analysis Aim: You have resources, criteria, and available options to make a decision. Analysis: multi-criteria decision analysis Aim: You want to develop some kind of measurement which can be used for decision making, measuring subjects’ attributes, or classifying subjects. Analysis: factor analysis, item response theory
Example of analysis Aim: You have a hypothesis that you want to test that it can be generaliable in a population. Analysis: regression, analysis of variance Aim: You want to see to what extent your idea can improve the current situation, but it is too expensive, or too risky, to change the current natural setting without confidence that it would work. Analysis: simulation
Heuristic methods for idea generation Analyse your own experiences Reflective practice Use case studies A single case (individual, family, group, or organisation) Collect practitioners’ rules of thumb Personal interviews with experts Use of role playing Pretend to be others Conduct a thought experiment Counterfactual thinking, simulation
Heuristic methods for idea generation Engage in participant observation Ethnography Analyse paradoxical incidents Qualitative analysis, role playing, thought experiments Engage in imaging Vitualise scenarios Use analogies and metaphors Imagistic simulation and vitualisation Reframe the problem in terms of opposite Understand how the opposite scenario happens
Heuristic methods for idea generation Apply deviant case analysis Understand why unusual cases happen Change the scale Alter the scope from local to global, or vice versa Focus on processes or focus on variables Changing the focus between how and what Consider abstractions or specific instances Look at a construct from general and context-specific viewpoints Make the opposite assumption Change our idea into something different
Heuristic methods for idea generation Apply the continual why and what Ask about things in the investigated phenomenon Consult your grandmother—and prove her wrong From bubba psychology, this approach allows you to deduce a specific statement, which could be wrong, from a general statement, which is believed to be right. Push an established finding to the extremes Observe outcomes from extreme changes Read biographies and literature, and be a well-rounded media consumer Use other non-academic materials as resources
Heuristic methods for idea generation Identify remote and shared/differentiating associates Create as many causes–effects as you can Shift the unit of analysis From individuals to pairs, groups, or organisations Shift the level of analysis From proximal to distal analysis Use both explanations rather than one or the other Think about alternative explanations Capitalise on methodological and technological innovations Use up-to-date approaches and tools to gather evidence
Heuristic methods for idea generation Focus on your emotions Record emotions as evidence What pushes your intellectual hot button? When you disagree with something and it is worth pursuing, do it. Ref: Jaccard & Jacoby (2010, pp.48–67)
Activity: Applied learning (30min) Develop a conceptual model that explains what factors affect your intention to return to a restaurant Explain the model to the class
Let us recap Causality is one of several types of relationships that scientists aim to establish. A quantitative study often requires a theory that allows us to focus and explain what is happening. A good theory is falsifiable. It requires a good understanding of theories (e.g. academic articles) and current trends (e.g. newspaper) to develop a measurement model and a hypothesis.
“I call ‘em as I see ‘em,” said the first “I call ‘em as I see ‘em,” said the first. The second replied, “I call ‘em as they are.” The third said, “what I call ‘em makes ‘em what they are.” Theory of Measurement
Activity: Applied learning (30min) As a group, design the measurement model to measure restaurant quality by: Discussing with group members about the definition of restaurant quality; Constructing 5 items measuring restaurant quality based on the definition with a Likert scale; and Reporting the result to the class.
Theory of measurement Theory of measurement deals with reliability and validity of a measurement. It is used in several disciplines such as psychometrics, econometrics, sociometrics, chemometrics, bibliometrics, and scientometrics. In a survey study, we normally use psychometrics which include construct/composite reliability, content/face validity, and construct validity.
Measurement paradigm A 20th century philosophy of measurement called representationalism saw numbers, not as properties inherent in an object, but as the result of relationships between measurement operations and the object (Chrisman, 1995, p. 272). Measurement of magnitudes is, in its most general sense, any method by which a unique and reciprocal correspondence is established between all or some of the magnitudes of a kind and all or some of the numbers, integral, rational, or real, as the case may be … In this general sense, measurement demands some one–one relation between the numbers and magnitudes in question—a relation which may be direct or indirect, important or trivial, according to circumstances (Russell, 1903, p. 176). Different analyses require/support different levels of measurement.
Measurement paradigm 𝑇 𝑖𝑗 𝑋 𝑖𝑗 𝐸 𝑖𝑗 𝛼 𝑗 𝛽 𝑗 𝛾 𝑗 𝜃 𝑖 𝑅= 𝑅𝑒,≥ 𝑂= 𝐼×𝑃,≽ Fundamental measurement theory Constructivism Representational structure To represent reality through scaling Do before data collection 𝑅= 𝑅𝑒,≥ 𝑂= 𝐼×𝑃,≽ Classical test theory Operationalism Error structure To describe score’s reliability Do before & after data collection 𝑇 𝑖𝑗 𝑋 𝑖𝑗 𝐸 𝑖𝑗 Latent variable theory Realism Explanatory structure To explain how data are generated Do after data collection 𝛼 𝑗 𝛽 𝑗 𝛾 𝑗 𝜃 𝑖
Measurement paradigm “The classical test theory model is the theory of psychological testing that is most often used in empirical applications. The central concept in classical test theory is the true score. The true scores are related to the observations through the use of the expectation operator: the true score is the expected value of the observed score.” (Borsboom, 2005, p. 3) The true score of any person 𝑖 on an item 𝑗 ( 𝑡 𝑖𝑗 ) is the expected value of the observed score (𝜀 𝑋 𝑖𝑗 ). The difference between the observed score and the true score is the error score ( 𝐸 𝑖𝑗 = 𝑋 𝑖𝑗 − 𝑡 𝑖𝑗 ).
Measurement paradigm “The latent variable model has been proposed as an alternative to classical test theory, and is especially popular in psychometric circles. The central idea of latent variable theory is to conceptualize theoretical attributes as latent variables. Latent variables are viewed as the observed determinants of a set of observed scores; specifically, latent variables are considered to be the common cause of the observed variables.” (Borsboom, 2005, p. 4) Depending on a priori hypothetical model, a true score may be caused by a person’s ability ( 𝜃 𝑖 ), an item difficulty or a precision (an intercept 𝛽 𝑗 ), an item discrimination or a scale (a slope 𝛼 𝑗 ), and a guessing effect ( 𝛾 𝑗 ).
Measurement paradigm “The representational measurement model—also known as ‘abstract’, ‘axiomatic’, or ‘fundamental’ measurement theory—offers a third line of thinking about psychological measurement. The central concept in representationalism is the scale. A scale is a mathematical representation of empirically observable relations between the people measured.” (Borsboom, 2005, p. 4) 𝑂= 𝐼×𝑃,≽ is an empirical relational system which represents an observation of the product from an item and a person which can be used to order items and persons independently. When measurement is additive, 𝑂 can be mapped into a numerical relational system: 𝑅= 𝑅𝑒,≥ .
In research ... Fundamental measurement theory Latent variable theory Measurement development Fundamental measurement theory Measurement validation Latent variable theory Measurement reliability Classical test theory
Breaking down a bit further ... FMT Construct specification Instrument function Assessment method Item generation Item alignment Item examination Quantitative parameter Instrumentation Stimuli creation Pre-test Pilot test Item screening Data collection Data preparation Construct validity Construct reliability CTT LVT CTT
Measurement assembly Construct Ontological plane Theoretical plane x1 Theories Models Scales Law statements Theoretical plane x1 x2 x3 x4 Empirical plane Instrumental plane X1: This course is useful. X2: I learned a lot of things in this course. X3: Content in this course is applicable in real-world situations. X4: All lecturers in this course are hot!
Content validity
Content validity Construct specification Domain What is included What is excluded Facets (substrata) Some researchers refer to facets as dimensions. Dimensions (e.g. rate, duration, magnitude) Modes (e.g. thought, behaviour) Temporal parameters (response interval, duration of time-sampling) Situations Function of instrument (e.g. brief screening, functional analysis, diagnosis)
Content validity Assessment method Item generation Deduction Experience Theory Literature Instrument Content expert Population
Content validity Item alignment Item examination Use table of construct to map against items Generate multiple items/facet Adjust the number of items relatively to the importance of facet Item examination Suitability of items for a facet Consistency, accuracy, specificity, and clarity of wording and definitions Remove redundant items
Content validity Quantitative parameter Instrumentation Response formats and scales Time-stamping parameters Instrumentation Create instructions to match with domain and function of assessment instrument Clarify and strive for specificity and appropriate grammatical structure
Content validity Stimuli creation (e.g. social scenarios, audio and video presentations) Pre-test (with expert for steps 1–3 and 5–9) Pilot test (with sample) Item screening (using content validation process by Lindell & Brandt (1999) Ref: Haynes, Richard, & Kubany (1995) and Lewis, Templeton, & Byrd (2005)
Reliability and Error
Ref: Alreck & Settle (2004, p. 58)
Reliability The extent that your measurement can produce consistent results (i.e. precision). Error is the difference between observed score and true score. True Score Error Observed Score
Error in observation Based on Groves (2004), observation error consists of 4 components: Instrument error is caused by design of instruments. Interviewer error is caused by ways of administration made by interviewers. Respondent error is caused by different individuals give responses with a different amount of error. Mode error is caused by using different modes of enquiry.
Instrument error Unstated criteria Inapplicable questions Wrong: How important is it for stores to carry a large variety of different brands of this product? Right: How important is it to you that the store you shop at carries a large variety of different brands? Inapplicable questions Wrong: How long does it take you to find a parking place after you arrive at the plant? Right: If you drive to work, how long does it take you to find a parking place after you arrive at the plant?
Instrument error Example containment Over-demanding recall Wrong: What small appliances, such as countertop appliances, have you purchased in the past month? Right: Aside from major appliances, what other smaller appliances have you bought in the past month? Over-demanding recall Wrong: How many times did you go out on a date with your spouse before you were married? Right: How many months were you dating your spouse before you were married?
Instrument error Over-generalisations Over-specificity Wrong: When you buy” fast food”, what percentage of the time do you order each of the following type of food? Right: Of the last 10 times you bought “fast food”, how many times did you eat each type of food? Over-specificity Wrong: When you visited the museum, how many times did you read the plaques that explain what the exhibit contained? Right: When you visited the museum, how often did you read the plaques that explain what the exhibit contained? Would you say always, often, sometimes, rarely, or never?
Instrument error Over-emphasis Ambiguity of wording Wrong: Would you favour increasing taxes to cope with the current fiscal crisis? Right: Would you favour increasing taxes to cope with the current fiscal problem? Ambiguity of wording Wrong: About what time do you ordinarily eat dinner? Right: About what time do you ordinarily dine in the evening?
Instrument error Double-barrelled questions Leading questions Wrong: Do you regularly take vitamins to avoid getting sick? Right: Do you regularly take vitamins? Why or why not? Leading questions Wrong: Don’t you see some danger in the new policy? Right: Do you see any danger in the new policy? Loaded questions Wrong: Do you advocate a lower speed limit to save human lives? Right: Does traffic safely require a lower speed limit?
Respondent error Social desirability Acquiescence Yea- and nay-saying Response based on what is perceived as being socially acceptable or respectable. Acquiescence Response based on respondent’s perception of what would be desirable to the sponsor. Yea- and nay-saying Response influenced by the global tendency toward positive or negative answers. Prestige Response intended to enhance the image of the respondent in the eyes of others.
Respondent error Threat Hostility Auspices Response influenced by anxiety or fear instilled by the nature of the question. Hostility Response arising from feelings of anger or resentment engendered by the response task. Auspices Response dictated by the image or opinion of the sponsor rather than the actual question.
Respondent error Mental set Order Extremity Cognitions or perceptions based on previous items influence response to later ones. Order The sequence in which a series is listed affects the responses to the items. Extremity Clarity of extremes and ambiguity of mid-range options encourage extreme responses.
Basic attributes of questions Focus: be specific Wrong: When do you usually go to work? Right: What time do you ordinarily leave home for work? Brevity: be succinct Wrong: When was the last time that you went to the doctor for a physical examination on your own or because you had to? Right: How many months ago was your last physical examination? Clarity: be clear Wrong: What do you have to say about the charities that our church contributes to? Right: How much influence do you, yourself, have on which charities your church contributes to?
Expression of questions Vocabulary: be understandable Wrong: Are you cognizant of all the concepts to be elucidated? Right: Do you know about all the ideas that will be explained? Grammar: be simple Wrong: How do you work it out when you want one thing and your spouse wants another and you both feel very strongly about it? Right: How do you settle disagreements with your spouse when you both have strong feelings about it?
Observation error reduction Instrument error: Use right wordings and ask correct questions Interviewer error: Train interviewers before collecting data Respondent error: Use self-administered questionnaires to reduce social desirability effect Ask respondents to answer honestly Use scales that force respondents to think before answering Use indirect questions for sensitive topics Develop clear instructions in questionnaires Use online questionnaires to order items randomly Use items that capture response bias Mode error: Collect data from multiple sources and/or via multiple modes (e.g. mail, online, and telephone) Ref: Biemer & Lyberg (2003)
Activity: Collaborative learning (10min) From the previous activity: Identify items that may cause confusion and improve them
Scaling
Question types Type Pro Con Open question Answers in their own words Exploratory Knowledge levels can be tapped Useful for developing closed questions Time consuming Answers must be coded Required greater effort to fill in Closed question Easy to administer and process Comparability of respondents Easier to understand questions based on available answers Difficult to explore new variables Difficult to develop scales of measurement not to overlap one another while to allow all possible answers Questions may be interpreted in different ways
Levels of measurement Ref: Chrisman (1998, p. 236) Level Information required Example Nominal Definitions of categories Sex Graded membership Definitions of categories plus degrees of membership or distance from prototype Socio-economic status Ordinal Definitions of categories plus ordering Rating scale Interval Unit of measure plus zero point Degree Celsius Log-interval Exponent to define intervals Richter magnitude scale Extensive ratio Unit of measure (additive rule applies) Length, mass, time Cyclic ratio Unit of measure plus length of cycle Angle Derived ratio Unit of measures (formula of combination) Density, velocity Counts Definition of objects counted Number of employees Absolute Type Probability, proportion
Number of rating points Increasing numbers of points improve: Transmitted information Measurement properties of continuous scale Reliability Validity Efficiency of statistical tests Common variance Communalities and structure of factors (exploratory factor analysis) Coefficient of determination (R2) And decrease: Sampling error
Ref: Preston & Colman (2000) Points vs Aspects
Intensity of agreement Ref: Matell & Jacoby (1971) Intensity of agreement
Intensity of agreement and disagreement Ref: Rotter (1972) Intensity of agreement and disagreement
Ref: Wildt & Mazis (1978) Adjectives
Ref: Jenkins & Taber (1977) R2 & Rating scale
Neutral point Can be used when we know that one variable has no effect on another. Caveat: People answer ‘neutral’ when: Questions lack readability (Velez & Ashworth, 2007) They want to please an interviewer or avoid giving an unacceptable answer (i.e. social desirability bias) (Garland, 1991) They neither know nor have high intensity about the issue (Presser & Schuman, 1980). They neither know nor undecided (Raaijmakers, van Hoof, Hart, Verbogt, & Vollebergh, 2000). Neutral responses decrease when a number of points increase (Matell & Jacoby, 1972). Neutral responses correlate with Flesch–Kincaid Grade Level index (i.e. readability test), average letters per word, and average number of syllables per word (Velez, 1993).
Neutral point “If the direction in which people are leaning on the issue is the type of information wanted, it is better not to suggest the middle ground … If it is desired to sort out those with more definite convictions on the same issue, then it is better to suggest the middle ground” (Payne, 1951, p. 64).
DK Option DK means Don’t Know. Can be used to filter out responses to which questions are inapplicable Caveat: Risk of reducing a usable sample size for multivariate analysis that doesn’t support missing data Discourage respondents who have low cognitive skills or devote little effort from responding accurately (Krosnick et al., 2002). DK is a non-random phenomena (Francis & Busch, 1975).
Other Findings Types of anchors (i.e. verbals and numbers) does not impact responses (Churchill & Peter, 1984). Respondents attend anchors even though it is unequal (Lam & Klockars, 1982). When anchors are available only at end points, respondents will treat categories to be equal (Lam & Klockars, 1982). Numbers at points depend on a priori knowledge. If we do not know whether a response would be positive or negative, use balance scale (e.g. -5 to +5). If we know that responses are positive, use positive scale (e.g. 0 to +5), and vice versa (Schwarz, Knäuper, Hippler, Noelle-Neumann, & Clark, 1991). Rating scale is considered to be interval scale when the number of points exceed 10 (Olson, 2008).
Activity: Cooperative learning (15min) Let us conduct quantitative content validation process (Lindell & Brandt 1999): Consolidate all items into a single measurement Rate each item whether it is: 4: Extremely relevant, 3: Substantially relevant, 2: Moderately relevant, 1: Minimally relevant, or 0: Not relevant. Record the ratings into the provided Excel file
SEM.xlam Link SEM.xlam with Microsoft Excel. You may search on the Internet how to link Add-Ins. Record the responses in the following format: each row is a rater and each column is an item. Click Macros button on Developer ribbon. If you cannot see Developer ribbon, right click on an empty space of a ribbon and choose Customize the ribbon to enable it. In the Macros window, type sem.xlam!qcv.qcv and click Run. Another window will pop up requesting the following information: Agreement data: The agreement responses about content validity of your measurement Min: The lowest value that you use in the scale of the agreement Max: The highest value that you use in the scale of the agreement Alpha: Type I error for testing the agreement Note: SEM.xlam does not work with Microsoft Excel Mac version since it does not share the same set of internal functions.
QCV algorithm QCV macro adopts the calculation and procedures proposed by Lindell & Brandt (1999). The minimum sample size is 10. Respondents must be content experts. 𝑟 𝑤𝑔 ∗ and 𝑟 𝑤𝑔 𝐽 ∗ are the measures of agreement between content experts on a particular item and a measurement respectively. They are not a measure of reliability (Lindell, Brandt, & Whitney 1999).
QCV outputs Item screening is done in 3 steps: Drop items that have mean values below the mid point of the scale. For instance, if you use a 5-point rating scale from 1 to 5, then the mid point is 2.5. Having a mean value below the mid point means that more than 50% of content experts do not agree that an item is relevant to a construct. Drop items that have p > 0.05. Having p > 0.05 means that the consensus among experts happens by chance. Drop items that have power < 0.80. Having power < 0.80 means that the consensus is not generalisable. Example: Sud-on, Abareshi, Pittayachawan, & Leo (2013)
Non-observation error
Non-observation Errors Groves (2004, p. 10) Error Bias Observation Errors Interviewer Respondent Instrument Mode Non-observation Errors Coverage Non-response Sampling Variance
Error Consists of 2 components: Bias is a constant error caused by research design. Variance is a variable error caused by obtaining data from different respondents, using different interviewers, and asking different questions. Both bias and variance consist of 2 components: Observation error is deviation of observed scores from true scores. Non-observation error is caused by failure to include other samples.
Error in non-observation Non-observation error consists of 3 components: Coverage error is caused by failure to include samples into a sampling frame. Non-response error is caused by respondents cannot be located or refuse to respond. Sampling error is caused by statistics producing results based on a subset of the population which may exhibits responses differently from other subsets.
Error in non-response Non-response may cause by: Respondents lack motivation or time Fear of being registered Travelling Unlisted, wrong, or changed contact details Answering machine Telephone number display Illness or impairment Language problems Business staff, owner, or structure changes Too difficult or boring Business policy Low priority Survey is too costly, or lack of time or staff Sensitive or bad questions Ref: Biemer & Lyberg (2003, p. 93)
Non-observation error reduction Coverage error: Identify samples missing from the sampling frame Use multiple sampling frames (and remove duplicated samples before using them) Non-response error: Use theories of participation (Cialdini 1990; Groves & Couper 1998) Reciprocation (e.g. incentives) Consistency (e.g. data vs research goal) Social validation (e.g. participation of similar respondents) Authority (e.g. reputation) Scarcity (e.g. rare opportunities) Liking (e.g. interviewers are similar to respondents) Tailor design method (Dillman, Smyth, & Christian 2009) Sampling error: Use probability sampling Weight cases for non-probability sampling with complex survey method Ref: Biemer & Lyberg (2003)
Translation
Translation techniques One-way translation The simplest method Less expensive and time consuming than other methods Information may lose through translation since there is no comparison between surveys Double translation Back-translation is added into one-way translation Researchers can detect mistranslation, inconsistencies, cultural gaps, and lost of meanings Can be done repeatedly to ensure proper translation More time and cost consumption
Translation techniques Translation by committee Ask two or more individuals who are familiar with both languages to translate documents, then ask both of them or a third individual to choose the one that most closely captures the meaning of the original version Individuals may hesitate to criticise others’ translated documents Decentering Both original and translated versions are constantly compared and adjusted to fit cultural gaps Establish cultural-and-linguistic-equivalent translation Time and cost consuming Ref: McGorry (2000)
Activity: Cooperative learning (15min) Let 1 of you translate the developed measurement from one language to another Let another of you translate the translated measurement back to the original language Let all of you discuss the original and the 2 back-translated measurement on which one, or part of it, is the most accurate and how you can improve it further If any, spot words/phrases that are difficult to translate. Discuss how to solve this issue.
Let us recap A measurement, especially a new one, must be thoroughly developed with theoretical justification to minimise different sources of errors. Even a measurement is valid, different words lead to different responses. Pre-test and pilot test are encouraged.
Introduction to statistics with SPSS
Statistics A tool for composing, decomposing, and converting data to generate information Each tool gives you different types of information Depends on your research questions and assumptions Each tool requires specific types of data and minimum numbers of observations Depends on data availability, ethic, and research design Warning: You must interpret resultant information to obtain knowledge by yourself. Software itself cannot tell you what is right or wrong.
From another aspect ... Wisdom Knowledge Information Data Rowley (2007, p. 176) From another aspect ... High Low Wisdom Knowledge Information Data Meaning Applicability Transferability Value Human input Structure Computer input Programmability Low High DIKW Pyramid
Analysis requirements Before analysing data, or planning to select a specific analysis before data collection, you must understand: Data structure Random variable Measurement Distribution Function
Data structure You must consider the following aspects in your research design: Unit of analysis: Focus of your study (e.g. individuals, pairs, organisations, regions, countries, times) Level Single level (e.g. all variables are individual/organisational data.) Multiple level (e.g. variables are a mixture of individual and organisational data) Observation: Accessible data (e.g. individuals, pairs, organisations, regions, countries) Sampling technique: Random, nth, cluster, strata Different analysis assumes different data structure.
Data structure Heterogeneity (lack of homogeneity of variance) will cause mixed data. You may either: Split data into homogeneous groups and analyse them separately, or Use mixture models or mixed models. Cluster sampling will cause multilevel data. You may either: Use multilevel models.
Random variable Random variable is a variable whose value is subject to variations due to chance. In data analysis, you must consider 2 issues: Types of random variables Latent variable, AKA unobserved variable, factor, and construct, is a variable that we do/can not measure. Manifest variable, AKA observed variable, item, and indicator, is a variable that we do/can measure. Number of random variables Different analyses require/support different numbers/types of random variables
Variables There are 2 types of variables in SEM: Manifest variable (observable) representing in a rectangular Data Latent variable (unobservable) representing in an eclipse Concept Construct Residual (unexplained variance for a latent variable and a dependent variable) Error (unexplained variance for a manifest variable in a measurement model) When only manifest variables and residuals are involved, it is called path analysis. When manifest and latent variables are involved, it is called SEM.
Distribution A random variable may have one of the following probability distributions: Normal Student t Chi-squared Binomial Poisson Exponential Different analyses assume different distributions.
Function A relationship between one or more random variables. It can be in the following forms: Linear 𝑦=𝑚𝑥+𝑐 Non-linear 𝑦=𝑚 𝑥 2 +𝑐 Moderation/interaction 𝑦= 𝑚 1 𝑥 1 + 𝑚 2 𝑥 2 + 𝑚 3 𝑥 1 𝑥 2 +𝑐 Different analyses assume different functions.
Function More complex forms can be created by combining these forms, but we should develop a model that can explain a phenomenon well enough rather than everything completely. There are shortcomings when you take the latter. Complexity: When someone is trying to use your model, they have to use a number of resources (e.g. computers) to predict a phenomenon. Generalisability: When a model explaining everything perfectly in one sample is used to predict another sample, it often ends up as a failure since there is no two samples that are exactly the same. Parsimony: Each additional complexity should provide substantive value into a theory.
Case study A survey study was conducted in Australia, UK, and Thailand via an online survey with university students and academics. Five hundreds responses were received. There are 41 variables in the survey. Task: You are a group of researchers, and your boss gives you the data to come up with a model explaining what factors affect trust in online shopping.
Analysis checklist Collate data into software Set up data file Calculate a response rate Evaluate frequency distribution Calculate descriptive statistics Plot graphs Explore data (during steps 4–6 plus exploratory data analysis) Detect outliers (during step 7 plus assumption tests) Conduct missing value analysis Test statistical assumptions Test measurement Test hypotheses (Type I error) Evaluate statistical power (Type II error) This checklist is by no mean a complete one. It is applicable only in a scenario which you plan to test your hypothesis. The checklist will vastly depend on aim and design of a research.
Importing Excel data file Start SPSS Click File Open Data Locate and open Quantitative data.xlsx Save the data file as the SPSS format (.sav)
What is SPSS? SPSS = Statistical Package for the Social Sciences It is software that supports a number of statistical analyses. After importing data, you need to set up data attributes. First, click Variable View.
Setting up data attributes Then, you will see the following columns: Name: a variable’s name Type: data type Width: maximum data length Decimals: the number of decimals Label: a variable’s description Values: a meaning of each instance in a variable Missing: instances considered to be missing data Columns: the width of a variable shown on spreadsheet Align: data alignment Measure: a level of measurement Role: a variable’s role Set up the attributes according to your understandings/assumptions about the data It is your responsibility to back up your work in this course.
Understanding your data Evaluate frequency distribution of each variable: Click Analyze Descriptive Statistics Frequencies (1) select all variables, (2) click Charts 2 1
Understanding your data File: Frequency.sps Understanding your data Click Histograms Click Continue and OK
Ask yourself What do these results tell me? Is there any pattern? Which instance is the most/least frequent? What is the shape of the distribution? Are the results what I expect? Is there any outlier? Is there anything strange about the results?
Outliers Type Description Solution Typographical error Mistyping data Correct values Response error Respondents misunderstood the instrument Scale error Different scales produce different distributions Use appropriate scale (too late!) Use different analysis Coverage error Cases are not from the target population Delete cases Valid cases Rare cases Mixed data Separate data Use analysis that is robust against outliers Use mixed model Use distribution-free analysis Valid distribution Distribution does not match the assumption of the chosen analysis. Transform data
Summarising your data Calculate descriptive statistics: File: Descriptive.sps Summarising your data Calculate descriptive statistics: Click Analyze Descriptive Statistics Descriptives (1) select all variables, (2) click Options 2 1
Summarising your data Tick options as shown in the figure Click Continue and OK
Descriptive statistics Analyses that summarise characteristics of your sample There are 5 measures: Central tendency Dispersion Position Distribution Association Descriptive statistics is a component that students must learn before adventuring into inferential statistics. Although the former looks simple and does not seem to have many uses, those who want to master statistics must understand it well. Specifically, if you can understand central tendency, variance, covariance, standard deviation, and z-score well, you can understand many sophisticated inferential statistical techniques.
Measures of central tendency Mean: the average value of a variable 𝑥 = 𝑥 𝑖 𝑛 Median: the value located in the middle of a variable Mode: the value having the highest frequency in a variable
Measures of dispersion Range: the difference between the highest and lowest values in a variable Variance: the average value of the squared difference between data and the mean 𝑠 2 = 𝑥 𝑖 − 𝑥 2 𝑛−1 Standard deviation: the square root of variance 𝑠= 𝑥 𝑖 − 𝑥 2 𝑛−1 Standard deviation of the sampling distribution of the sample mean is called standard error of the mean (S.E.).
Measures of position Quantiles: partitions of data Quartiles: divide data equally into 4 parts Percentiles: divide data equally into 100 parts Interquartile Range (IQR): the difference between the third and first quartile z-score: the relative location of a case in a sample 𝑧= 𝑥− 𝑥 𝑠
Measures of distribution Skewness: the degree of symmetry in the distribution of data Skewness value divided by its S.E. equals to z-score. If the absolute value of z-score is higher than 1.96, data is not univariate normal. Kurtosis: the degree of peak in the distribution of data Kurtosis value divided by its S.E. equals to z-score. If the absolute value of z-score is higher than 1.96, data is not univariate normal. Box plot: visual presentation containing quartiles and outliers Cases located between 1.5IQR and 3IQR are outliers. Cases located beyond 3IQR are extreme values.
Measures of association Covariance: the product of the difference between data and the mean of two variables 𝑐𝑜𝑣 𝑥,𝑦 = 𝑥 𝑖 − 𝑥 𝑦 𝑖 − 𝑦 𝑛−1 Correlation: covariance divided by the product of standard deviation of the two variables (Pearson correlation coefficient) 𝑟= 𝑐𝑜𝑣 𝑥,𝑦 𝑠 𝑥 𝑠 𝑦
Plotting your data 1 2 3 4 5 Plot graphs by clicking Graphs Chart Builder, and you will see: Graph types Graph subtypes Graph preview Variables Variables’ instances
Plotting your data Double click Simple Bar 1 2 3 Double click Simple Bar Chart preview will update itself Drag & drop a variable COU into the chart preview to where the arrow points, then click OK What do you understand from this bar chart? Try to experiment other variables with different types of graphs
Box plot A type of graph that integrates mean, median, skewness, kurtosis, and quartiles. It can detect outliers (i.e. values that lie between 1.5IQR and 3IQR) and extreme cases (i.e. values that lie beyond 3IQR). outlier extreme case 1st QR 4th QR 2nd QR 3rd QR Mean Median
Plotting your data Plot graphs by clicking Graphs Chart Builder: File: Boxplot.sps Plotting your data 2 3 1 Plot graphs by clicking Graphs Chart Builder: Choose Boxplot Double click on Simple Boxplot Drag & drop a variable TRU1 into the chart preview to where the arrow points, then click OK What do you understand from the boxplot? Try to produce boxplots for other variables
Use of graphs The importance of graphs is to help laypeople to understand your research. It also helps you to understand your data at an early stage of data analysis. Plotting your data requires creativity and imagination on how you want, or what the most appropriate way, to illustrate your data visually. It also requires you to understand when each type of graphs can be properly used. For example, a bar chart is suitable for categorical data while a histogram is suitable for continuous data. In contrast, although a line chart is also suitable for continuous data, it is mainly used to demonstrate changes over time. Experiment different variables with different graphs and see yourself what makes sense and what does not.
Exploring data Techniques such as cluster analysis, cross-tabulation, correlation, regression, exploratory factor analysis, and log-linear modeling are often used for exploring data. They are known as exploratory data analysis.
Cluster analysis A techniques used to classify cases or variables into 2 or more groups which share similar characteristics: Hierarchical cluster analysis calculates a distance between a pair of objects and decides whether they should be in the same group. This process is done repeatedly till there is only one group left. K-means cluster analysis is a technique that uses means to differentiate memberships of objects based on the number of clusters specified in advance. This method is preferable when a sample size is larger than 1,000. Two-step cluster analysis supports both categorical and continuous data. It is also preferable when a sample size is very large. The advantage of this method is that it can find an optimal number of clusters within a data set based on a range of specified cluster numbers.
K-means cluster analysis Assumptions Interval data Independent observations Independent variables: Variables should not be correlated, but each of them adds value into categorising cases.
K-means cluster analysis File: Cluster.sps K-means cluster analysis Plot graphs by clicking Analyze Classify K-Means Cluster: Choose variables TRU1, TRU2, and TRU3 Click Save; tick Cluster membership; click Continue Click Options; tick ANOVA table; click Continue; click OK 2 1 3
Inspecting groups of cases Click Data Split File: Click Compare groups Select the variable created by k-means cluster analysis located at the bottom of the list; click OK Try to conduct frequency distribution and descriptive statistics and inspect whether these groups share similar or different values If you plan to conduct other analyses normally, be sure to disable this feature first! 1 2
Ask yourself Do you think that the result will be the same if you use other variables or combinations of different variables to classify the cases?
Self-directed learning Try to understand what data is telling you by looking at both a big picture (i.e. the whole data set) and small pictures (i.e. decompose the data set into smaller chunks perhaps via demographic variables) by using frequency distribution, graphs, descriptive statistics, and cluster analysis. When you analyse data, start writing what you have done, and make it as a habit. It does not matter that the report will be incomplete or jumpy. You may lose the track if you work it out inside your head alone.
Extra stuff Try to use hierarchical cluster analysis and two-step cluster analysis to classify the cases and see whether it produces similar or different results You may also use other exploratory data analysis to help you understand data further.
Let us recap Exploring data is an important step which may be iterative. It may help you detect unusual or wrong things in your data. Assumptions about your data may be challenged here so you should keep your mind open and be flexible.
Any question?
29th Sep
Introduction to SEM
Testing causality Experimental research is required in order to test the existence of a relationship between an independent variable and a dependent variable while controlling external variables. However, there are several reasons of why it cannot be done: Too expensive Too time-consuming Randomisation and manipulation is impossible or unethical Phenomenon is currently unobservable
Is there any other way? Pearl (1998) argues that structural equation modeling (SEM) allows us to test our ideas with non-experimental data under the assumption that a causal model is true. The SEM results allow us to infer that, if we have a physical mean to manipulation an independent variable in a controlled experiment, when an independent variable is fixed by 1 unit, a dependent variable will be changed by x unit.
What is path analysis? “The method of path coefficients does not furnish general formulae for deducing causal relations from knowledge of correlations and has never been claimed to do so. It does, however, within certain limitations, give a method of working out the logical consequences of a hypothesis as to the causal relations in a system of correlated variables.” Wright (1923, p. 254)
What is SEM? SEM is a framework purposed by Karl Gustav Jöreskog, James Ward Keesling, and David E. Wiley in 1970s to integrate maximum likelihood, a measurement model (i.e. factor analysis), and a structural model (i.e. path analysis). Bentler (1980) calls it the JKW model. LISREL was the first software that implemented this framework. Charles Spearman is credited for factor analysis and Sewall Wright for path analysis.
SEM’s assumptions SEM assumes 2 things: Causal assumptions are true Correlations between variables exist Additional assumptions may impose depending on algorithms used to estimate parameters: Maximum likelihood (ML) assumes data to be interval and multivariate normal distribution.
SEM software EQS LISREL Mplus Mx Ωnyx R: lavaan, OpenMx, sem2 SAS: CALIS, TCALIS SPSS: AMOS Stata: GLLAMM, SEM STATISTICA: SEPATH
Regression Purpose: To test the relationship between multiple independent variables and a dependent variable. Type: nominal, ordinal, interval, ratio Type: interval Samples: independent Distribution: normal No multicollinearity IV1 IV2 DV R IV3 Residual Variance : homoscedasticity Distribution: normal
PLS-PM (PLS path modelling) Purpose: : To test the relationship between multiple independent and dependent variables Type: ordinal, interval, ratio Measurement model r2 e1 X1 IV1 e2 X2 Y1 e5 IV3 e3 X3 Y2 e6 IV2 e4 X4 Structural model r1 Type: continuous
Type: ordinal, interval, ratio SEM Purpose: To test the structure and measurement of the relationships between multiple independent and dependent variables Type: ordinal, interval, ratio Measurement model r2 e1 X1 IV1 e2 X2 Y1 e5 IV3 e3 X3 Y2 e6 IV2 e4 X4 Structural model r1 Type: continuous
Aspect Regression PLS-PM SEM Aim Maximise 𝑅 2 of a dependent variable Maximise 𝑅 2 of dependent variables Replicate a sample variance–covariance matrix Model Simple Complex Dependent Variable Single Multiple Control Variable First block/variable N/A All Multicollinearity Allowed for some types of regression Allowed Data Fitting Just-identified (𝑑𝑓=0) Over-identified (𝑑𝑓>0) Interpretation Change in observation Change from manipulation Equation One at a time Simultaneously Generalisability Adjusted 𝑅 2 𝑄 2 All results
What is over-identification? It is a scenario when you have more data points (i.e. information) than you need to estimate parameters (𝑑𝑓>0). Different combinations of data points produce different solutions. This leads to multiple results. A higher level of over-identification means a higher number of solutions are supported by the conceptual model when the model fits the data. Consequently, this brings about a higher level of generalisability, compared to regression that produces a single result. Imagine that you have 3 equations (i.e. information): 𝑥+𝑦=0, 𝑥−𝑦=1, and 2𝑥+𝑦=5. There are 2 unknown variables. Different sets of 2 equations generate different answers.
Benefits of over-identification Statistical power Predictive accuracy Generalisability Error Sample-dependent Capitalisation on chance
Why more generalisable? A model with parameters that are fully optimised from a specific sample cannot be replicate in another sample because the results capture all specificity of that sample. Statistics is a technique that aims in generalisation by discarding specificity and retaining commonality across different samples. Since results from regression is fully optimised on a specific sample, its generalisation can only be inferred. This is because regression uses 100% of data. This is called just-identified, meaning that there is a single solution in your model. In contrast, path analysis allows us to use data less than 100% (i.e. depending on your model specification); consequently, the results are over-identified, meaning that there are multiple solutions in your model. Basically, a model which can be applied in multiple solutions are better than that which can be applied in a single solution. If it does not make sense to you, substitute the word “solution” with either “scenario” or “situation”.
Why less error? When you conduct multiple tests and assume that these hypotheses are dependent, you inflates a Type I error. The actual Type I error for multiple tests is called familywise error. It can be calculated as 𝑓𝑎𝑚𝑖𝑙𝑦𝑤𝑖𝑠𝑒 𝑒𝑟𝑟𝑜𝑟=1− 1−𝛼 𝑛 when n is the number of tests. For example, given 𝛼=5%, conducting a series of 4 multiple regression analysis will have a Type I error of 19% rather than 5%. To maintain the same confidence level, we need to decrease α to 5% 4 =1.25% per test based on Bonferroni adjustment.
Familywise error Test(s) Error 1 5% 11 43% 2 10% 12 46% 3 14% 13 49% 4 19% 14 51% 5 23% 15 54% 6 26% 16 56% 7 30% 17 58% 8 34% 18 60% 9 37% 19 62% 10 40% 20 64%
Why less error? Although Bonferroni adjustment allows us to use simple analysis to test multiple dependent hypotheses, we are at risk of failing to reject null hypotheses because α is too small, especially when we have many hypotheses. Remember that when we decreases α, β will increase unless we increase n.
Benefits of SEM A Type I error is controlled Direct/indirect/total effects Causal inference Complex models Measurement reliability and validity Likert scale is not assumed as interval data Model fit Model misspecification Multiple-group analysis Invariance analysis Generalisability Combined with other advanced techniques (e.g. resampling, Bayesian, Monte Carlo, latent class analysis, item response theory, mixed models, multi-level modelling)
What is tested in SEM? Aspect Experiment Non-experiment Environment Controlled Uncontrolled Causal assumptions Not required Required Statistical assumptions What to be tested It tests the existence of the relationship between an independent variable and a dependent variable. It tests to what extent an independent variable affects a dependent variable under the assumption that a causal model is true.
Interpretation of 𝑦=𝑏𝑥+𝜀 Regression SEM What would be the difference in the expected value of 𝑌 if we were to observe 𝑋 at level 𝑥+1 instead of level 𝑥? 𝜀 is the deviation of 𝑌 from its conditional expectation. The equality sign is symmetrical. What would be the change in the expected value of 𝑌 if we were to intervene and change the value of 𝑋 from 𝑥 to 𝑥+1? 𝜀 is the deviation of 𝑌 from its controlled expectation. The equality sign is asymmetrical. Ref: Pearl (1998)
What is 𝜀? 𝜀 (epsilon) is an error variable. It represents the influence of omitted variables, which are background variables outside the conceptual model. Note: go back to Descartes’ drawing to review IN and OUT concepts.
Error in statistics There are 2 components: Systematic error represents something that is not captured in a model and causes you to add free parameters an error variable with others. Random error, AKA measurement error, represents something that uniquely causes observed scores to deviate from true scores. This type of error is supported by latent variable models. It is a combination of error generated by interviewers, respondents, and instruments. When multitrait–multimethod (MTMM) is used, mode error can also be incorporated.
Four-step approach Ref: Mulaik & Millsap (2000) Unrestricted model Exploratory factor analysis Reliability analysis Measurement model Confirmatory factor analysis Construct validity Structural equation model Model testing Model respecification Model selection Prespecified model Fixed parameters Simulation Ref: Mulaik & Millsap (2000)
First step: Unrestricted model To identify the number of factors First step: Unrestricted model
Latent variable models Manifest variables are Continuous Categorical Factor analysis Item response theory Mixture model Latent class analysis Continuous Latent variables are Categorical
Measurement analysis Adapted from Salzberger (2009, p. 274) Principal component analysis Construct an index Classical test theory Incorporate an error Generalizability theory Incorporate multiple errors Exploratory factor analysis Explore latent variables Confirmatory factor analysis Test a latent variable Item response theory Explore a scale Rasch model Test a scale Pragmatic orientation Methodological rigour
Exploratory factor analysis (EFA) Purpose: To create a psychometric measurement by discovering an underlying pattern and conceptualising a latent variable F1 F2 X1 X2 X3 X4 Item Type: interval Samples: independent e1 e2 e3 e4
EFA use To partial the measurement error out of the observe scores To explore underlying patterns in the data To determine the number of latent variables To reduce the number of variables To assess the reliability of each item To eliminate multi-dimensional items (i.e. cross-loaded variables)
Dimensionality It is an important concept while developing and testing an instrument. For a latent variable, it is common for an abstract concept to be multi-dimensional although it will increase complexity in testing your model when you plan to measure. If you want to test a second-order factor, you must have 3+ first-order factors. For a manifest variable, it is not unusual to see some indicators to measure more than one latent variable. Nonetheless, some academics attempt to eliminate them to purify a measurement.
Dimensionality Multi-dimensional factor Multi-dimensional construct Uni-dimensional construct Uni-dimensional factor Multi-dimensional factor Multi-dimensional item Uni-dimensional item
EFA processes Extract Rotate Interpret
Extraction When you assume data to be the population: Principal axis factoring assumes that factors are hypothetical and that they can be estimated from variables. Image factoring assumes that factors are real and that they can be estimated from variables.
Extraction When you assume data to be a sample randomly selected from the population: ULS attempts to minimise the sum of squared differences between estimated and observed correlation matrices, excluding the diagonal. GLS attempts to minimise the sum of squared differences between estimated and observed correlation matrices while accounting uniqueness of variables (i.e. the more the uniqueness of variables is, the less weight the variables have). ML attempts to estimate parameters which are likely to produce the observed correlation matrix. The estimated correlation matrix is accounted for the uniqueness of variables. Kaiser’s alpha factoring assumes that variables are randomly sampled from a universe of variables. It attempts to maximise reliability of factors.
Rotation Orthogonal rotation assumes factors to be uncorrelated from one another: Quartimax maximises the sum of variances of loadings in rows of the factor matrix. It attempts to minimise the number of factors needed to explain each variable. This method tends to make a number of variables highly loaded on a single factor. Varimax maximises the sum of variances of loadings in columns of the factor matrix. It attempts to minimise the number of variables highly loaded on each factor. Equamax combines Quartimax and Varimax approaches, but it is reported to behave erratically.
Rotation Oblique rotation assumes factors to be correlated with one another: Direct oblimin allows you to adjust a degree of correlation between factors. Delta value of 0 allows factors to be moderately correlated. On the other hand, delta value of +0.8 allows factors to be more correlated while delta value of -0.8 allows factors to be less correlated. Promax is quicker than direct oblimin. It is useful with a large sample.
Extraction & rotation F1 F1 F2 F2
Activity: Applied learning (15min) File: Factor.sps Activity: Applied learning (15min) Click Analyze Dimension Reduction Factor Select TRU1–WEB19 into Variables Click Descriptives; tick Coefficients, Significance levels, Determinant, KMO and Bartlett’s test of sphericity, Reproduce, and Anti-image; click Continue Click Extraction, choose Maximum likelihood for Method, tick Scree plot, click Continue Click Rotation, choose Direct Oblimin, click Continue Click Options, tick Sort by size and Suppress small coefficients, type .33, click Continue Click OK
Reading EFA Results Correlation matrix is a good start to look for bad items: those that do not correlate with others (r<0.3) and those that correlate highly with other (r>0.9). These items are subject for elimination. In addition, a determinant value higher than 10-5 signifies that there is no multicollinearity issue. Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy tells you whether data is factorable: >0.9 is marvellous >0.8 is meritorious >0.7 is middling >0.6 is mediocre >0.5 is miserable <0.5 is unacceptable
Reading EFA Results Bartlett’s test of sphericity determines whether the observed correlation matrix is different from the identity matrix, meaning that you cannot analyse data with EFA if the data is the identity matrix since there is no correlation between variables. The null hypothesis is: H0: The observed correlation matrix and the identity matrix have the same value. Anti-image correlation matrix contains the KMO measure of each variable (i.e. a diagonal element) and negatives of partial correlation among variables (i.e. off-diagonal elements). The KMO measure less than 0.5 means that a particular variable is subject for elimination. In addition, a value of negative partial correlation should be small.
Reading EFA Results Total Variance Explained table shows the proportion of variance explained by factors: Based on Kaiser’s criterion, factors having eigenvalues greater than 1 should be retained. The logic behind this argument is that a factor should explain at least one variable. Generally, this criterion leads to an overfactoring issue. However, this justification is accurate when: The number of variables is less than 30 and extracted communalities are all greater than 0.7. The sample size is more than 200 and extracted communalities are 0.6 or higher. Communalities table shows the percentage of variance of each variable explained by factors. It is item reliability (R2).
Reading EFA Results Reproduced Correlations table displays the predicted correlation matrix. Ideally, it should be the same as the observed correlation matrix. It also shows residual which is the difference between the predicted correlation matrix and the observed correlation matrix. The percentage of non-redundant residuals with an absolute value higher than 0.5 should be less than 50%. Scree Plot depicts eigenvalues gained from an additional factor. The cut-off point should be at where the slope changes dramatically. Use of this graph is controversial as being subjective.
Reading Results Factor Matrix shows the correlation between items and factors (i.e. a factor loading) before rotation is taken place. Pattern Matrix shows the correlation between items and factors after rotation is taken place. Structure Matrix shows the correlation between items and factors, accounted for relationships between factors. Actually, it is the product of the pattern matrix and the factor correlation matrix. Factor Correlation Matrix shows the correlation between factors.
How many factors? You should use a combination of: Your measurement Your conceptual model Theories Literature Eigenvalues (Kaiser’s criterion) Pattern matrix Communalities (item reliability) Scree plot Parallel analysis
EFA’s problems EFA procedures are arbitrary. Kaiser’s criterion (eigenvalues > 1) and Cattell’ scree plot often leads to overfactoring and sometimes leads to underfactoring. Bartlett’s test is sensitive to a sample size. Ref: Hubbard & Allen (1987), and Zwick & Velicer (1986)
File: rawpar Parallel analysis To determine the maximum number of factors to be extracted by assessing eigenvalues of the data against those of the simulation. Factors to be extracted must have eigenvalues higher than those of the simulated ones (i.e. 95th percentile values). You may use the SPSS script provided by O’Connor (2000) or the online engine provided by Patil, Singh, Mishra, and Donovan (2008).
Activity: Demonstrative learning (10min) File: rawpar.sps Activity: Demonstrative learning (10min) Demonstrate how to conduct parallel analysis Examples: Molla, Cooper, & Pittayachawan (2011) Sud-on, Abareshi, Pittayachawan, & Leo (2013)
PA results Root Data Mean Percentile 1 8.85 0.66 0.74 2 3.79 0.59 0.64 3 2.85 0.54 0.58 4 1.94 0.49 5 1.7 0.45 6 1.45 0.42 0.46 7 1.08 0.38 0.41 8 0.91 0.35 9 0.69 0.32 10 0.55 0.29 11 0.43 0.27 0.3 12 0.36 0.24 13 0.34 0.21 14 0.19 0.22 15 0.17 16 0.15 0.14 0.16 Simulated eigenvalues show that extracting more than 15 factors leads to over-factoring. Factors from this point onward happens by chance.
PA scree plot
Internal consistency Cronbach’s is internal consistency based on inter-item correlation. Split-half reliability is to randomly separate measurement into two parts in order to calculate correlation between them. It assumes variance in each part is equal. This option produces Spearman–Brown split-half reliability coefficient and Guttman split-half reliability coefficient. Guttman’s lower bounds calculates six reliability coefficients: 1: an intermediate coefficient used to calculate other 2: coefficient which is more complex than Cronbach’s 3: an equivalent version of Cronbach’s 4: Guttman split-half reliability 5: recommended when a single item is highly correlated with others, which lack high correlation among themselves 6: recommended when inter-item correlation is low in relation to square multiple correlation Parallel model assumes equal variance among items and among their error Strict parallel model assumes equal variance among items and among their error and items have equal mean
Cronbach’s Cronbach’s , or coefficient , assumes factor loadings of all items are equal (i.e. essentially τ-equivalent model). When this assumption is met, it is reliability (Novick & Lewis, 1967). Its formula can be written as (Bacon, Sauer, & Young 1995; Cronbach 1951): 𝛼= 𝑘 𝑘−1 1− 𝑘 𝑘+ 𝑖=1 𝑝 𝜆 𝑖 2 − 𝑖=1 𝑝 𝜆 𝑖 2 It should be at least 0.70.
Activity: Applied learning (5min) Calculate Cronbach’s (Analyze Scale Reliability Analysis) based on the results that you have from EFA. Then try to add any item in a subsequent analysis and observe how its value changes.
Issues of Cronbach’s It cannot be used to evaluate unidimensionality/homogeneity of items/tests (Green, Lissitz, & Mulaik 1977). It underestimates reliability when the measurement is not essentially τ-equivalent (Cortina 1993). It can overestimate reliability when items are heterogeneous, measurement errors are correlated, and subsets of the items are congeneric (Raykov 1998).
Issues of Cronbach’s It overestimates reliability when a measurement deviates from unidimensionality (Shevlin, Miles, Davies, & Walker 2000). Jum Nunnally never says that Cronbach’s > 0.7 is acceptable (Lance, Butts, & Mitchels 2006). In contrast, he said that (Nunnally & Bernstein 1994, p. 265): “A satisfactory level of reliability depends on how a measure is being used. In the early stages of predictive or construct validation research, time and energy can be saved using instruments that have only modest reliability, e.g. .70 ... In contrast to the standards used to compare groups, a reliability of .80 may not be nearly high enough in making decision about individuals ... If important decisions are made with respect to specific test scores, a reliability of .90 is the bare minimum, and a reliability of .95 should be considered the desirable standard. However, never switch to a less valid measure simply because it is more reliable.”
Issues of Cronbach’s Using it to delete items can be misleading (Raykov 2007) and decrease criteron validity (Raykov 2008). Use of Cronbach’s is discouraged since its assumptions are unlikely to hold in practices and its reliability estimates can be either underestimated or overestimated (Green & Young 2008).
Second step: Measurement model To test the measurement Second step: Measurement model
Measurement parameters Mean is precision and difficulty. Factor loading, or slope, is scale and discrimination. Error variance is error and unique variance. Theoretically, it should be random error. However, if there is systematic error, 2+ errors will be correlated.
Type of measurement models For each case i and item j Model Formula Congeneric 𝑥 𝑖𝑗 = 𝛼 𝑗 + 𝜆 𝑗 𝑇 𝑖 + 𝛿 𝑖𝑗 Essentially τ-equivalent 𝑥 𝑖𝑗 = 𝛼 𝑗 + 𝑇 𝑖 + 𝛿 𝑖𝑗 τ-equivalent 𝑥 𝑖𝑗 = 𝑇 𝑖 + 𝛿 𝑖𝑗 Parallel 𝑥 𝑖𝑗 = 𝑇 𝑖 + 𝛿 𝑖
Type of measurement models Essentially τ-equivalent Parallel ξ 1 2 3 4 x1 x2 x3 x4 Congeneric τ-equivalent δ1 δ2 δ3 δ4 Ref: Graham (2006)
Type of measurement models Variable-length ξ = 1 1 2 3 4 η1 ζ η2 η3 ζ ζ η4 ζ 1 2 3 4 x1 x2 x3 x4 Ref: Jöreskog (1978)
Measurement assumption Local independence: Measurement errors of items must be uncorrelated to ensure that the items measure only one latent variable (i.e. uni-dimensionality).
Confirmatory factor analysis (CFA) Purpose: To test a psychometric measurement by hypothesising an underlying pattern based on a known construct. F1 F2 X1 X2 X3 X4 e1 e2 e3 e4 Item Type: interval Samples: independent
CFA To test a specific measurement model (i.e. parallel, τ-equivalent, essentially τ-equivalent, congeneric, and variable-length models) To test a higher-order factor model To test construct validity (i.e. convergent validity, discriminant validity, and factorial validity) To assess to what extent the measurement fits the data To perform multiple-group or invariance analysis To prepare the measurement model for structural equation modeling (SEM)
Measurement validation Convergent validity To test dimensionality of a factor and items To assess construct reliability Method: one-factor model Discriminant validity To test that 2 factors represent different things Method: nested two-factor model (i.e. one model assumes two factors to be the same thing and the other does not), average variance extracted ( 𝜌 𝑣𝑐 AKA average variance extract (AVE)) Factorial validity To test that all factors in the measurement fits the data Method: multi-factor model
Measurement validation Convergent validity Discriminant validity Factorial validity Construct reliability Construct validity
File: Convergent validity.amw
Activity: Applied learning (20min) Draw one-factor model that represents web design by using the results from EFA. You must request AMOS to produce the following results via Output tab in Analysis Properties: Standardized estimates Squared multiple correlations Modification indices Each student tries to test different factors for subsequent analysis
Estimators AMOS ML GLS ULS SLS ADF Bayesian LISREL IV TSLS WLS DWLS Mplus MLM MLMV MLR MLF MUML WLSM WLSMV ULSMV
AMOS results Sample Moments: the sample variance–covariance matrix Estimates: the estimated results based on the conceptual model. The results include: Unstandardised estimates Standardised estimates Variances Squared multiple correlations ( or coefficient of determination) Estimated variance–covariance matrix (implied covariances) Estimated correlation matrix (implied correlations) Residual variance–covariance matrix (residual covariances) Standardised residual variance–covariance matrix (standardized residual covariances).
AMOS results Assessment of Normality: univariate and multivariate normality tests Observations farthest from the centroid: Mahalanobis distance (d2) p1: the probability of di2 to exceed the centroid Large p1 value means that a case i is probably an outlier under the assumption of multivariate normality. p2: the probability of the largest di2 to exceed the centroid Large p2 value means that there are probably outliers under the assumption of multivariate normality. Basically p1 looks at a specific case while p2 looks at all cases.
What are we testing? In path analysis, software will estimate a sample variance–covariance matrix based on our conceptual model. This matrix is tested against the real sample matrix to test the null hypothesis that: H0: The estimated matrix and the sample matrix are the same. This is done by using χ2 test which is a non-parametric test equivalent to t-test. The χ2 test can be found at CMIN under Model Fit.
Testing a model in SEM Data Model ? Sample Matrix Estimated Matrix ≈
CMIN CMIN is a minimum discrepancy function. AMOS supports the following functions: Maximum likelihood (ML) Generalized least squares (GLS) Unweighted least squares (ULS) Scale-free least squares (SLS) Asymptotically distribution-free (ADF) ML is generally robust against data which moderately deviates from multivariate normality, thereby being used by default.
Model types in results There are 3 types of models that you will find in AMOS results: Default: the conceptual model This model is your hypothesis. Saturated: the conceptual model with df=0 This model is equivalent to regression. Independence: the conceptual model with maximum df This model assumes no relationship among variables.
Model fit: CMIN NPAR: number of parameters in the model More detail can be found under Parameter summary CMIN and P: χ2 test is a discrepancy function between an estimated matrix and a sample matrix. DF: degrees of freedom df = total data points - free parameters When df<0, model is under-justified, meaning that it cannot be solved because there are not enough data points. CMIN/DF: normed χ2 test 𝜒 2 𝑑𝑓
t-rule To enable an identifiable model, you must ensure that you do not have the number of free parameters higher than data points. This is a necessary but not sufficient condition for model identification. It can be calculated with the formula below (Bollen, 1989): 𝑡≤ 𝑘 𝑘+1 2 t = the number of free parameters (i.e. parameters that we estimate in a model) k = the number of observed variables
Model fit: RMR, GFI RMR: root mean square residual is the average difference between the population matrix and the sample matrix. However, in practice, standardized root mean square residual (SRMR) is used. 𝑅𝑀𝑅= 𝑖𝑗 𝑠 𝑖𝑗 − 𝜎 𝑖𝑗 2 𝑘 GFI: goodness-of-fit index is the percentage of variances that the model can reproduce. 𝐺𝐹𝐼=1− 𝐹𝑀𝐼𝑁 𝑚𝑜𝑑𝑒𝑙 𝐹𝑀𝐼𝑁 𝑛𝑢𝑙𝑙
Model fit: RMR, GFI AGFI: adjusted goodness-of-fit index is the adjusted value of GFI to account for model complexity. 𝐴𝐺𝐹𝐼=1− 1−𝐺𝐹𝐼 𝑑𝑓 𝑚𝑜𝑑𝑒𝑙 𝑑𝑓 𝑛𝑢𝑙𝑙 PGFI: parsimonious goodness-of-fit index is the adjusted value of GFI to account for model parsimony 𝑃𝐺𝐹𝐼=𝐺𝐹𝐼 𝑑𝑓 𝑚𝑜𝑑𝑒𝑙 𝑑𝑓 𝑛𝑢𝑙𝑙
Model fit: Baseline comparisons NFI: normed fit index is a rescaled χ2 with a range of 0 and 1. It is used to compared a conceptual model with an independence model. 𝑁𝐹𝐼=1− 𝜒 𝑚𝑜𝑑𝑒𝑙 2 𝜒 𝑛𝑢𝑙𝑙 2 RFI: relative fit index, AKA BL86, is the adjusted value of NFI to account for model complexity. 𝑅𝐹𝐼=1− 𝜒 𝑚𝑜𝑑𝑒𝑙 2 𝑑𝑓 𝑚𝑜𝑑𝑒𝑙 𝜒 𝑛𝑢𝑙𝑙 2 𝑑𝑓 𝑛𝑢𝑙𝑙
Model fit: Baseline comparisons IFI: incremental fit index, AKA BL89, is derived from NFI to account for complexity of an evaluated model. 𝐼𝐹𝐼= 𝜒 𝑛𝑢𝑙𝑙 2 − 𝜒 𝑚𝑜𝑑𝑒𝑙 2 𝜒 𝑛𝑢𝑙𝑙 2 − 𝑑𝑓 𝑚𝑜𝑑𝑒𝑙 TLI: Tucker–Lewis index, AKA NNFI (nonnormed fit index), is an adjusted value of IFI to account for model complexity. 𝑇𝐿𝐼= 𝜒 𝑛𝑢𝑙𝑙 2 𝑑𝑓 𝑛𝑢𝑙𝑙 − 𝜒 𝑚𝑜𝑑𝑒𝑙 2 𝑑𝑓 𝑚𝑜𝑑𝑒𝑙 𝜒 𝑛𝑢𝑙𝑙 2 𝑑𝑓 𝑛𝑢𝑙𝑙 −1
Model fit: Baseline comparisons CFI: comparative fit index is a rescaled χ2 that accounts for a noncentrality parameter. 𝐶𝐹𝐼=1− 𝑁𝐶𝑃 𝑚𝑜𝑑𝑒𝑙 𝑁𝐶𝑃 𝑛𝑢𝑙𝑙
Model fit: Parsimony-adjusted measures PRATIO: parsimony ratio is the ratio of the degrees of freedom of the evaluated model and of the baseline model. 𝑃𝑅𝐴𝑇𝐼𝑂= 𝑑𝑓 𝑚𝑜𝑑𝑒𝑙 𝑑𝑓 𝑛𝑢𝑙𝑙 PNFI: parsimonious normed fit index is the adjusted value of NFI to account for model parsimony. 𝑃𝑁𝐹𝐼=𝑁𝐹𝐼×𝑃𝑅𝐴𝑇𝐼𝑂
Model fit: Parsimony-adjusted measures PCFI: parsimonious comparative fit index is the adjusted value of CFI to account for model parsimony. 𝑃𝑁𝐹𝐼=𝐶𝐹𝐼×𝑃𝑅𝐴𝑇𝐼𝑂
Model fit: NCP NCP: noncentrality parameter is an estimated used for calculating CFI. LO 90: Lower limit of 90% confidence interval of NCP. HI 90: Upper limit of 90% confidence interval of NCP.
Model fit: FMIN FMIN: a minimum value of the discrepancy function F, which derives from χ2 to account for the non-centrality parameter F0: a discrepancy function between an estimated matrix and the population matrix. LO 90: Lower limit of 90% confidence interval of F0. HI 90: Upper limit of 90% confidence interval of F0.
Model fit: RMSEA RMSEA: root mean square error of approximation is a discrepancy function between an estimated matrix and the population matrix while accounting for model complexity. 𝑅𝑀𝑆𝐸𝐴= 𝐹 0 𝑑𝑓 𝑚𝑜𝑑𝑒𝑙 LO 90: Lower limit of 90% confidence interval of RMSEA. HI 90: Upper limit of 90% confidence interval of RMSEA. PCLOSE: a significance test of RMSEA to test the null hypothesis: H0: The estimated matrix and the population matrix are the same.
Model fit: AIC AIC: Akaike information criterion is a discrepancy function between an estimated sample matrix and an estimated population matrix. It is used for model selection process to penalise a complex model without a substantive improvement. BCC: Browne–Cudeck criterion provides a penalty greater than AIC. CAIC: Consistent Akaike information criterion provides a penalty greater than BCC. BIC: Bayes information criterion provides a penalty greater than CAIC.
Model fit: ECVI ECVI: Expected cross-validation index is equivalent to AIC. MECVI: Modified expected cross-validation index is equivalent to BCC.
Model fit: HOELTER HOELTER: a crude measurement of statistical power of χ2. It is used to accept or reject a model when the number of hypothetical observations is larger or smaller than the number of actual observations. HOELTER ≥ 200G (i.e. G is the number of groups) signifies sufficient statistical power. When a model is accepted and HOELTER < 200, it means that the model would be rejected when the number of observations exceed HOELTER’s n. When a model is rejected while the number of observations exceed HOELTER’s n, it means that the model would be accepted when the number of observation is equal or less than HOELTER’s n.
Index Value Meaning Reference 𝜒 2 p > 0.01 The sample matrix and the estimated matrix are the same. Yu (2002) SRMR < 0.07 The average error in the model is minimal. RMSEA < 0.06 with PCLOSE > 0.05 The population matrix and the estimated matrix are the same, and the average error in the model is minimal. IFI > 0.95 The over-identification condition of the model is at an acceptable level. TLI CFI > 0.96 PCFI > 0.85 The estimated parameter is robust against other samples. Mulaik (1998) BIC Smallest The model is more generalisable. Pitt and Myung (2002)
What if my model doesn’t fit the data? Then something must be wrong: Data Go back to your raw data and see what might have gone wrong (e.g. typo) Assumptions Explore your data further whether any assumption is extremely violated Model Misspecified model Your model is wrong Theory is wrong
Modification index Modification index (MI), or Lagrange multiplier (LM), is a measure that indicates how to fit the model better by estimating a new parameter. It provides two pieces of information: χ2: This test tells you how much χ2 would reduce when a parameter is modified. Par Change: This measure tells you how much a parameter value will change when a parameter is modified. A positive value means the current model underestimates a parameter. A negative value means the current model overestimates a parameter. MI provides three tables: covariances, variances, and regression weights. When one or more parameters do not fit the data, they will be listed here.
Activity: Applied learning (15min) Use MI to modify the model. You must modify one parameter at a time. Also, you are encouraged to jot down what you have modified. Re-evaluate the model to see whether it fits the data.
Correlated errors Freeing a correlational parameter between error terms may be a post hoc practice to improve model fit. However, it should be supported by a theoretical explanation. Gerbing & Anderson (1984) explain that one possibility to have a correlated errors is due to multi-dimensionality.
Modified model MI must be used with caution. Mindless model modification leads to the following issues: Theoretical nonsense Overfitting More error Less statistical power More capitalising on chance Less predictive accuracy More sample-dependent Reduced generalisability
What to consider ... An item with p > 0.05 should be deleted. Items that have correlated measurement errors may be deleted or taken out to form a new latent variable. This should be done with theoretical justification. Items that have reliability less than 0.50 may be deleted. When there are only 2 items left, a measurement model may become unstable.
AMOS results: Scalar Unstandardised estimates (regression weights) z-test p-value (2-tailed) Standardised estimates Variances Squared multiple correlations ( 𝑅 2 or coefficient of determination): The percentage that a dependent variable is explained by other variables
AMOS results: Matrix Estimated variance–covariance matrix (implied covariances) Estimated correlation matrix (implied correlations) Residual variance–covariance matrix (residual covariances): 𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑎𝑡𝑟𝑖𝑥−𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑎𝑡𝑟𝑖𝑥 Standardised residual variance–covariance matrix (standardized residual covariances): 𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑎𝑡𝑟𝑖𝑥−𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑎𝑡𝑟𝑖𝑥 𝑆.𝐸.
Use of estimates Unstandardised Standardised To create a mathematical equation To use as a priori parameters To simulate a model Stable across samples To calculate an effect size To communicate with others To compare with other studies Unstable across samples since a parameter is standardised using a sample-specific standard deviation
Model fitting processes Model evaluation Fit? Model rectification No Path evaluation Yes
Construct reliability When a model is not essentially τ -equivalent model (i.e. congeneric model), coefficient Ωw (Allen 1973; Bacon, Sauer, & Young 1995), or coefficient H (Hancock & Mueller 2001), is recommended. Ωw has a good theoretical property that its value is not less than the most reliable item in the construct. As a result, it is also known as maximal reliability. The formula is Ω 𝑤 = 𝑖=1 𝑝 𝜆 𝑖 2 1− 𝜆 𝑖 2 1+ 𝑖=1 𝑝 𝜆 𝑖 2 1− 𝜆 𝑖 2 It should be at least 0.70.
Activity: Applied learning (10min) Calculate coefficient Ωw using the spread sheet and factor loadings from CFA. The Ωw function in SEM.xlam is OmegaW(array). It only accepts standardised estimates of factor loadings. Observe the differences in values between Cronbach’s and coefficient Ωw. The function in SEM.xlam is alpha(array). It only accepts standardised estimates of factor loadings.
Use of reliability index To report construct reliability To be used as a priori parameter values (i.e. factor loading and error variance), especially for creating a composite variable (i.e. parcelling) Munck (1979) demonstrates that you may use a single-indicator factor and still makes a model identified by calculating a factor loading (λ) and error variance (θ) with the following formulae: 𝜆= 𝜎 𝑥 𝑟 𝜃= 𝜎 𝑥 2 1−𝑟
Average variance extracted Average variance extracted (AVE) or ρvc(η) (Fornell & Larker 1981) is used to estimate the percentage that a set of items represent a factor. Its formula is: 𝜌 𝑣𝑐 𝜂 = 𝑖=1 𝑝 𝜆 𝑖 2 𝑖=1 𝑝 𝜆 𝑖 2 + 𝑖=1 𝑝 𝜀 𝑖 It should be at least 0.50. Normally, to meet this criteria, each item reliability (squared multiple correlation) should be at least 0.50 as well.
Activity: Applied learning (5min) Calculate coefficient AVE using the spread sheet and factor loadings from CFA. The AVE function in SEM.xlam is AVE(array). It only accepts standardised estimates of factor loadings.
Discriminant validity File: Discriminant validity.amw Discriminant validity
Activity: Applied learning (20min) Draw two-factor model with any pair of your choice. Try to fit the model.
Correlation–AVE comparison Produce a correlation matrix for every pair of factors. Calculate a squared correlation matrix (γ2) AVE must be higher than γ2 Example: Molla, Cooper, & Pittayachawan (2011)
What to consider ... If one of factor of a pair has AVE less than γ2, you may merge them into the same factor. This should be based on theoretical justification. A cross-loaded item may be deleted. When this is done, you must go back and conduct convergent validation again.
Activity: Applied learning (5min) Do the two factors that you just analysed hold discriminant validity?
File: Factorial validity.amw
Activity: Applied learning (15min) Draw all-factor model. Try to fit the model.
What to consider ... A cross-loaded item may be deleted. When this is done, you must go back and conduct convergent and discriminant validation again.
Discriminant validity Construct validity Convergent validity Fit? Discriminant validity Fix/free parameter Split factor Drop item Factorial validity Diff? Combine factors Ref: Molla, Cooper, & Pittayachawan (2011)
Let us recap Construct validation is an iterative process. You need an amount of patience. You must do it sequentially: convergent validity, discriminant validity, and factorial validity. You also must calculate construct reliability. Each step the model must fit the data. Dropping an item may not be the best solution since it affects your conceptual model (i.e. theory) as well as model stability (i.e. analysis). When you create a new parameter, you must justify on a theoretical ground.
Third step: Structural equation model To test the hypotheses (and may discover new ones in the process) Third step: Structural equation model
Up to this point Before conducting SEM, you must ensure that: All factors hold construct validity, reliability, and uni-dimensionality. All items hold uni-dimensionality. Preferably, each factor has local independence (i.e. absence of correlation between errors).
Activity: Applied learning (25min) Create a structural model to test the following hypotheses: Security positively affects trust. Web design positively affects trust. Web design positively affects security. You must request AMOS to produce the following results via Output tab in Analysis Properties: Standardized estimates Squared multiple correlations Modification indices Indirect, direct & total effects Try to fit the model You also may use a model of your choice.
File: SEM.amw
AMOS results: Matrix Total effects (direct effects + indirect effects) Standardised total effects Direct effects Standardised direct effects Indirect effects Standardised indirect effects
Direct effect The direct effect of X on Y is the increase that we expect to see in Y by γ unit given a unit increase in X. X Y γ
Indirect effect The indirect effect of X on Y is the increase that we expect to see in Y by γβ unit while leaving X untouched and increasing Z to whatever value that Z would attain under a unit increase of X. X Z Y γ β
Total effect The total effect of X on Y is the increase that we expect to see in Y by γyx+γzxβ unit under a unit increase of X. X Z Y γzx β γyx
Modes of causal enquiry Techniques Association Bivariate correlation Crosstabulation analysis Conditional association Partial correlation Regression Mechanism PLS-SEM Path analysis Structural equation modeling All-cause structure
Modes of causal enquiry Associational analysis: This study is a precondition for subsequent causal analysis as per saying “no causation without correlation”. Conditional associational analysis: This study is to establish correlation between variables while controlling another variable. Mechanism-based analysis: This study is to introduce a mediator lying between independent and dependent variables while controlling spurious relationships. All-cause structural analysis: This study is to include all causes and mediators into the model while controlling spurious relationships. Ref: Morgan & Winship (2007, p. 287)
Bootstrapping It is a resampling method that allows you to create subsets of data by assuming the original data to be the population. It is useful when data violate one or more assumptions or when we want to evaluate biases in estimates. AMOS uses bootstrapping to estimate p-values for indirect effects. Warning: When Bollen–Stine bootstrap is selected, AMOS cannot produce bootstrapped estimates. You must run bootstrapping twice: one for χ2 and another for estimates. Example: Pittayachawan (2008)
Activity: Applied learning (10min) You must request AMOS to produce the following results via Bootstrap tab in Analysis Properties: Perform bootstrap (and set the value to 500) Bias-corrected confidence intervals (and set the value to 95) Bootstrap ML Bollen–Stine bootstrap
Bootstrapping outputs S.E. is the standard error, or standard deviation, of each parameter. S.E-S.E is the standard error of the standard error. It should be very small. Mean is the estimated value of each parameter. Bias is the difference between the bootstrapped estimate and the original estimate. SE-Bias is the standard error of the bias.
Power analysis A posteriori methods (i.e. after data collection): Model (MacCallum, Browne, & Sugawara 1996) Path (Xu 2010)
Activity: Applied learning (5min) Calculate statistical power of the model that you just fit with the MacCallum, Browne, & Sugawara’s method using MacCallumPow(,df,n,RMSEAa,RMSEA0) from SEM.xlam when: is Type I error. df is the degrees of freedom of your model. n is the sample size. RMSEAa is the value that you want to test against. For instance, in the case that your model fits the data, you may use RMSEAa=0.09. RMSEA0 is the value of your model.
Activity: Applied learning (5min) Calculate statistical power of the model that you just fit with the Xu’s method using Xu(,ρa,ρ0,n) from SEM.xlam when: is Type I error. n is the sample size. ρa is the standardised estimate of a specific path in your model. ρ0 is the value that you want to test against.
Power analysis A priori methods (i.e. before data collection): Model (Kim 2005) Model complexity (Westland 2010) Path (Westland 2010)
Activity: Applied learning (5min) Calculate the minimum sample size with the Kim’s method using KimN(,df,1-β,RMSEA) from SEM.xlam when: is Type I error. df is the degrees of freedom of your model. 1-β is desired statistical power RMSEA is the value that you want reject with adequate statistical power. It would make sense to use the cut-off value here (e.g. RMSEA=0.06)
Activity: Applied learning (5min) Calculate the minimum sample size with the Westland’s method focusing on model complexity using WestlandNr(ratio) from SEM.xlam when the input is the ratio of manifest variables and latent variables. For instance, if your model contains 16 manifest variables and 4 latent variables, then the ratio will be 4.
Activity: Applied learning (5min) Calculate the minimum sample size with the Westland’s method focusing on a specific path, or the minimum value of path coefficients that you expect, using WestlandNl(,1-β,ρa,ρ0) from SEM.xlam when: is Type I error. 1-β is desired statistical power ρa is the standardised estimate of a specific path, or the minimum value of path coefficients, in your model. ρ0 is the value that you want to test against.
Ideally ... Use a priori power analysis to help you calculate the minimum sample size before data collection to save cost & time. Then use a posteriori power analysis to calculate actually statistical power after data collection when the obtained sample size is different from the planned one or when values of standardised estimates are different from your expectation.
Let us recap SEM can test direct and indirect effects simultaneously. To test indirect effects, you must use bootstrapping which is a two-step process. After testing the null hypothesis, you should conduct power analysis to check what the probability that you can generalise your result in another sample.
Fourth step: Prespecified model To evaluate generalisability Fourth step: Prespecified model
Validity The best available approximation to the truth or falsity of propositions, including propositions about causation (Cook & Campell (1979) cited in Ferguson (2004)) There are 2 types of validity (Campbell & Stanley (1963) cited in Ferguson (2004)): Internal validity refers to the confidence with which one can make statements about relationships between variables, based on the forms in which the variables were manipulated or measured (Cook & Campell (1979) cited in Ferguson (2004)). E.g. content and construct validity External validity pertains to the generalisability of the treatment effect to other populations, settings, treatment variables, or measurement variables (Campbell & Stanley (1963) cited in Ferguson (2004)).
Generalise to/across Cook and Campbell (1979) and Lynch (1999) stated that: Generalising to populations includes applying findings to the target populations, settings, or times that were represented in the sample Generalising across populations includes applying findings to populations, settings, or times that were not represented in the sample
Generalise to There are two levels of “generalise to” (Lee & Baskerville 2003): Generalise to the population: qualitative research Generalise to another sample randomly drawn from the population: quantitative research
External validity vs Generalisability External validity is a function of the researcher and the design of the research. Generalisability is a function of the researcher and the user. Ref: Ferguson (2004)
Fixed parameters SEM allows us to test parameter values found in the previous studies to test whether it is generalizable in another study.
Activity: Demonstrative learning (5min) Demonstrate how to fix a parameter’s value.
Bayesian analysis Frequency-based analysis assumes the true parameter’s value to be fixed. In contrast, Bayesian analysis assumes the true parameter’s value to be variable. Therefore, Bayesian analysis is a good technique to assess external validity of the model fitted by frequency-based analysis such as SEM.
Activity: Demonstrative learning (10min) Demonstrate how to conduct Bayesian SEM.
Multiple samples
Multiple samples Whether your data is heterogeneous by design or its nature, you should consider conducting multiple-group analysis since it may confound relationships between variables. Effects of confounders may include: Spurious relationships Reverse signs of estimates Failure of relationship detection Increase of measurement error
How to detect heterogeneity Split data and plot charts based on demographics Check outliers Use exploratory data analysis Check unusual factor scores
Multiple-group analysis SEM allows you to test a model against multiple groups of samples simultaneously. This technique is useful when you know/suspect/hypothesise that samples are heterogeneous. Example: Dang, Pittayachawan, & Nkhoma (2013)
Setting up multiple-group analysis From the menu Analyze, click Manage Group Rename the current group to Experienced Click New and rename the new group to Inexperienced Click the icon Select data file(s) Click the button Grouping Variable Select the variable EXPm and click OK Click the button Group Value Select the group having the value 1, click OK For the group Inexperienced, locate the same data file and assign the group having the value 0
Activity: Applied learning (20min) Try to split data into 2 groups and then fit the model simultaneously.
Results from multiple-group analysis When the model fits the data: What do the results say? What is the similarity/difference between 2 groups?
Invariance analysis SEM allows you to test a parameter to be equal across groups of samples. This technique is useful when, for example, you hypothesise that the effect of one variable on another variable is constant across groups of samples.
Setting up invariance analysis Identify which parameter is very similar across both groups Open Object Properties window for that parameter Name that parameter Notice that All groups are ticked, which means that this parameter is equal across groups of samples
Activity: Applied learning (20min) Try to put some constraints into the model from multiple-group analysis and fits the model further.
Results from invariance analysis When the model fits the data: What do the results say?
What types of parameters can be constrained to be invariant? Mean AMOS automatically centred means of all parameters in a model to 0. If you want to test latent mean values, first you need to set AMOS to estimate means and intercepts. After that, you must name all mean parameters that you want AMOS to estimate through Object Properties window. Slope (structural parameters) Residual
Let us recap When your data are heterogeneous, or your hypothesis is to compare 2 or more groups, use multiple-group analysis. When your hypothesis is to test that the parameter estimates are constant across groups, use invariant analysis. Although these techniques are interesting and add an additional lens into a theory, discussing results is more complicated.
Factor score
What is a factor score? A factor score, or a latent score, is an estimated value of a factor for a particular case. It is useful to use a factor score when: You are unable to fit the model Your model is extremely complex You have a second-order factor but there are only 2 first-order factors (i.e. you need at least 3 first-order factors to make an identifiable model. It should be used after measurement validation, especially unidimensionality must be ensured (Kim & Hagtvet 2003; Meade & Kroustalis 2005; Plummer 2001).
Advantages It fits the model better as the items/percel ratio increases (Nasser & Takahashi 2003; Plummer 2001). It improves accuracy of estimates (Hall, Snell, Foust 1999). It reduces the effect of non-normality (Hau & Marsh 2004). It has negligible effects on parameter bias and standard errors (Nasser-Abu Alhija & Wisenbaker 2006). Although this bias (i.e. < 15%) is negligible, appropriate estimators must be used (i.e. WLSMV for rating scales) (Bandalos 2008).
How to calculate a factor score Click Data Imputation under Analyze menu and you will have 3 options: Regression imputation: AMOS uses ML values as model parameters to run regression to predict missing values. Stochastic regression imputation: AMOS assumes a conditional distribution of missing values based on observed values to impute missing values randomly with model parameters equal to ML values. Bayesian imputation: This is the same as stochastic regression imputation but model parameters are unknown and only estimated. Choose regression imputation.
Activity: Applied learning (20min) Try to impute factor scores and use them to test your model instead.
Disadvantages It confounds relationships at the item level. If an inappropriate estimator is used (i.e. ML for rating scales), it will increase bias in estimates from 20% to over 130% (Bandalos 2008). Even if an appropriate estimator is used, when data is severely not normal, it increases Type II error. Misspecified models fit data as good as, if not better than, correctly specified models (Plummer 2001). It increases the chance of running into the problems of non-convergence and Heywood cases (Nasser & Wisenbaker 2003) although the chance is decreased when the items/factors ratio increases (Plummer 2001).
Let us recap Use of item-level analysis or parcel-level analysis entirely depends on the situation. You, however, should always aim to do the former before the latter.
Summary
SEM methodology Conceptualisation Instrumentation Identification Mensuration Preparation Specification Estimation Evaluation Rectification Alternation Selection Explanation
SEM methodology Conceptualisation: This is done during literature review or following a subsequent study. Theories are used to explain a phenomenon. Instrumentation: Concepts are linked into manifest variables to create an instrument and a procedure for data collection. Identification: A model must be guaranteed that it will be identified. If it is not, then an additional variable must be included. Mensuration: Data are collected while minimising measurement error. Preparation: Data are collated, cleaned, and structured in a format supported by software.
SEM methodology Specification: A model is specified properly in software. The difficulty level depends on which software and analysis are used. Estimation: An appropriate estimator is chosen and used. Different estimators have different statistical assumptions and require different sample sizes. Estimators sometimes are limited by types of analysis. Evaluation: A model is evaluated at both model and parameter levels that it fits data. It is often done in the following procedures: EFA, CFA, and SEM.
SEM methodology Rectification: In practice, a model often does not fits data (e.g. 𝑝<.05) and needs to be respecified. MI is normally used as a tool to identify a cause of misfit. After a model is respecified, it needs to be evaluated. Alternation: In some disciplines, fitting a model is not a goal. In fact, no one can prove that the fitted model is the true one. Since SEM’s condition is over-identified, countless models can fit data. It is encouraged that a researcher also recognises and identifies an alternative hypothesis to explain a phenomenon. This can be achieved in 3 ways: nested model, equivalent model, and competitive model.
SEM methodology Selection: After original and alternative models have been fitted, one model must be selected to represent a phenomenon. Explanation: A fitted, and selected, model must be explained. This also includes correlation between error terms.
Survey methodology
Any question?
References Al-Saleh, M. F., & Yousif, A. E. (2009). Properties of the standard deviation that are rarely mentioned in classrooms. Australian Journal of Statistics, 38(3), 193–202. Allen, M. P. (1973). Construction of composite measures by the canonical-factor-regression method. Sociological Methodology, 5, 51–78. Alreck, P. L., & Settle, R. B. (2003). The survey research handbook (3rd ed.). New York, NY, USA: McGraw–Hill/Irwin. Bacon, D. R., Sauer, P. L., & Young, M. (1995). Composite reliability in structural equations modeling. Educational and Psychological Measurement, 55(3), 394–406. Bandalos, D. L. (2008). Is parceling really necessary? A comparsion of results from item parceling and categorical variable methodology. Structural Equation Modeling, 15(2), 211–240. Bentler, P. M. (1980). Multivariate analysis with latent variables: Causal modeling. Annual Review of Psychology, 31(1), 419–456. Biemer, P. P., & Lyberg, L. E. (2003). Introduction to survey quality. Hoboken, NJ, USA: John Wiley & Sons, Inc. Bollen, K. A. (1989). Structural equations with latent variables. New York, NY, USA: Wiley. Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics. Cambridge, UK: Cambridge University Press. Bunge, M. (2009). Causality and modern science (4th ed.). New Brunswick, NJ, USA: Transaction Publishers.
References Chrisman, N. R. (1995). Beyond Stevens: A revised approach to measurement for geographic information. Presented at the 13th International Symposium on Computer-Assisted Cartography, Charlotte, NC, USA. Chrisman, N. R. (1998). Rethinking levels of measurement for cartography. Cartography and Geographic Information Systems, 25(4), 231–242. Churchill, G. A., Jr, & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21(4), 360–375. Cialdini, R. B. (1990). Deriving psychological concepts relevant to survey participation from the literatures on compliance, helping, and persuasion. Presented at the 1st Workshop on Household Survey Nonresponse, Stockholm, Sweden. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ, USA: Lawrence Erlbaum Associates, Inc. Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78(1), 98–104. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. Dang, D. P. T., Pittayachawan, S., & Nkhoma, M. Z. (2013). Contextual difference and intention to perform information security behaviours against malware in a BYOD environment: A protection motivation theory approach. Presented at Australasian Conference on Information Systems, Melbourne, Australia. Dillman, D. A., Smyth, J. D., & Christian, L. M. (2009). Internet, mail, and mixed-mode surveys: The tailored design method (3rd ed.). Hoboken, NJ, USA: John Wiley & Sons, Inc.
References Ferguson, L. (2004). External validity, generalizability, and knowledge utilization. Journal of Nursing Scholarship, 36(1), 16–22. Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18, 39–50. Francis, J. D., & Busch, L. (1975). What we now know about “I don't knows.” Public Opinion Quarterly, 39(2), 207–218. Garland, R. (1991). The mid-point on a rating scale: Is it desirable? Marketing Bulletin, 2, 66–70. Gerbing, D. W., & Anderson, J. C. (1984). On the meaning of within-factor correlated measurement errors. The Journal of Consumer Research, 11(1), 572–580. Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33, 587–606. Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66(6), 930–944. Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827–838. Green, S. B., & Young, Y. (2008). Commentary on coefficient alpha: A cautionary tale. Psychometrika, 74(1), 121–135. Groves, R. M. (2004). Survey errors and survey costs. Hoboken, NJ, USA: John Wiley & Sons, Inc. Groves, R., & Couper, M. (1998). Nonresponse in household interview surveys. New York, NY, USA: Wiley-Interscience. Hall, R. J, Snell, A. F., & Foust, M. S. (1999). Item parceling strategies in SEM: Investigating the subtle effects of unmodeled secondary constructs. Organizational Research Methods, 2(3), 233–256.
References Hancock, G.R. and R.O. Mueller (2001). Rethinking Construct Reliability Within Latent Variable Systems. In Cudeck, R., S. Du Toit, and D. Söbom (eds.). Structural Equation Modeling: Present and Future, Lincolnwood: Scientific Software International, Inc., pp. 195–216. Hau, K.-T., & Marsh, H. W. (2004). The use of item parcels in structural equation modelling: Non-normal data and small sample sizes. British Journal of Mathematical Statistical Psychology, 57, 327–351. Hayduk, L. A., Robinson, H. P., Cummings, G. G., Boadu, K., Verbeek, E. L., & Perks, T. A. (2007). The weird world, and equally weird measurement models: Reactive indicators and the validity revolution. Structural Equation Modeling, 14(2), 280–310. Haynes, S. N., Richard, D. C., & Kubany, E. S. (1995). Content validity in psychological assessment: a functional approach to concepts and methods. Psychological Assessment, 7(3), 238–247. Hubbard, R., & Allen, S. J. (1987). An empirical comparison of alternative methods for principal component extraction. Journal of Business Research, 25, 173–190. Jaccard, J., & Jacoby, J. (2009). Theory construction and model-building skills: A practical guide for social scientists. New York, NY, USA: The Guilford Press. Jenkins, Jr., G. D., & Taber, T. D. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62(4), 392–398. Jöreskog, K. G. (1978). Structural analysis of covariance and correlation matrices. Psychometrika, 43(4), 443–477. Kim, K. H. (2005). The relation among fit indexes, power, and sample size in structural equation modeling. Structural Equation Modeling, 12(3), 368–390.
References Kim, S., & Hagtvet, K. A. (2003). The impact of misspecified item parceling on representing latent variables in covariance structure modeling: A simulation study. Structural Equation Modeling, 10(1), 101–127. Krosnick, J. A., Holbrook, A. L., Berent, M. K., Carson, R. T., Hanemann, W. M., Kopp, R. J., Mitchell, R. C., et al. (2002). The impact of “no opinion” response options on data quality: Non-attitude reduction or an invitation to satisfice? Public Opinion Quarterly, 66(3), 371–403. Lam, T. C., & Klockars, A. J. (1982). Anchor point effects on the equivalence of questionnaire items. Journal of Educational Measurement, 19(4), 317–322. Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did they really say? Organizational Research Methods, 9(2), 202–220. Lee, A. S., & Baskerville, R. L. (2003). Generalizing generalizability in information systems research. Information Systems Research, 14(3), 221–243. Lewis, B. R., Templeton, G. F., & Byrd, T. A. (2005). A methodology for construct development in MIS research. European Journal of Information Systems, 14, 388–400. Lindell, M. K., & Brandt, C. J. (1999). Assessing interrater agreement on the job relevance of a test: A comparison of CVI, T, rWG(J), and r*WG(J) indexes. Journal of Applied Psychology, 84(4), 640. Lindell, M. K., Brandt, C. J., & Whitney, D. J. (1999). A revised index of interrater agreement for multi-item ratings of a single target. Applied Psychological Measurement, 23(2), 127–135. Lynch, J. G., Jr. (1999). Theory and external validity. Journal of the Academy of Marketing Science, 27(3), 367–376.
References MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Study I: Reliability and validity. Educational and Psychological Measurement, 31, 657–674. Matell, M. S., & Jacoby, J. (1972). Is there an optimal number of alternatives for Likert-scale items? Effects of testing time and scale properties. Journal of Applied Psychology, 56(6), 506–509. McGorry, S. Y. (2000). Measurement in a cross-cultural environment: Survey translation issues. Qualitative Market Research, 3(2), 74–81. Meade, A. W., & Kroustalis, C. M. (2005). Problems of item parceling with CFA tests of measurement invariance. Presented at the Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA, USA. Molla, A., Cooper, V., & Pittayachawan, S. (2011). The green IT readiness (g-readiness) of organizations: An exploratory analysis of a construct and instrument. Communications of the Association for Information Systems, 29(1), 67–96. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York, NY, USA: Cambridge University Press. Mulaik, S. A. (1998). Parsimony and model evaluation. The Journal of Experimental Education, 66(3), 266–273. Mulaik, S. A., & Millsap, R. E. (2000). Doing the four-step right. Structural Equation Modeling, 7(1), 36–73.
References Munck, I. M. E. (1979). Model building in comparative education: Applications of the LISREL method to cross-national survey data. International Association for the Evaluation of Educational Achievement Monograph Series No. 10. Stockholm: Almqvist & Wiksell. Nasser, F., & Takahashi, T. (2003). The effect of using item parcels on ad hoc goodness-of-fit indexes in confirmatory factor analysis: An example using Sarason’s reactions to tests. Applied Measurement in Education, 16(1), 75–97. Nasser, F., & Wisenbaker, J. (2003). A Monte Carlo study investigating the impact of item parceling on measures of fit in confirmatory factor analysis. Educational and Psychological Measurement, 63(5), 729–757. Nasser-Abu Alhija, F., & Wisenbaker, J. (2006). A Monte Carlo study investigating the impact of parceling strategies on parameter estimates and their standard errors in CFA. Structural Equation Modeling, 13(2), 204–228. Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32(1), 1–13. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill, Inc. O’connor, B. P. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments, & Computers, 32(3), 396–402. Springer. Olson, B. F. (2008). Evaluating the error of measurement due to categorical scaling with a measurement invariance approach to confirmatory factor analysis. The Faculty of Graduate Studies, The University of British Columbia.
References Patil, V. H., Singh, S. N., Mishra, S., & Todd Donavan, D. (2008). Efficient theory development and factor retention criteria: Abandon the “eigenvalue greater than one” criterion. Journal of Business Research, 61(2), 162–170. Payne, S. L. (1951). The art of asking questions. Princeton: Princeton University Press. Pearl, J. (1998). Graphs, causality, and structural equation models. Sociological Methods & Research, 27(2), 226–284. Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). New York, NY, USA: Cambridge University Press. Pitt, M. A., & Myung, I. J. (2002). When a good fit can be bad. TRENDS in Cognitive Sciences, 6(10), 421–425. Pittayachawan, S. (2008). Fostering consumer trust and purchase intention in B2C e-commerce. School of Business Information Technology, Business Portfolio, RMIT University, Melbourne, VIC, Australia. Plummer, B. A. (2001). To parcel or not to parcel: The effects of item parceling in confirmatory factor analysis. The University of Rhode Island. Kingston, RI, USA. Presser, S., & Schuman, H. (1980). The measurement of a middle position in attitude surveys. Public Opinion Quarterly, 44(1), 70–85. Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15.
References Raaijmakers, Q. A. W., van Hoof, A., Hart, H., Verbogt, T. F. M. A., & Vollebergh, W. A. M. (2000). Adolescents’ midpoint responses on Likert-type scale items: Neutral or missing values? International Journal of Public Opinion Research, 12(2), 208–216. Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22(4), 375–385. Raykov, T. (2007). Reliability if deleted, not ‘alpha if deleted’: Evaluation of scale reliability following component deletion. British Journal of Mathematical and Statistical Psychology, 60, 201–216. Raykov, T. (2008). Alpha if item deleted: A note on loss of criterion validity in scale development if maximizing coefficient alpha. British Journal of Mathematical and Statistical Psychology, 61, 275–285. Rotter, G. S. (1972). Attitudinal points of agreement and disagreement. The Journal of Social Psychology, 86, 211–218. Rowley, J. (2007). The wisdom hierarchy: Representations of the DIKW hierarchy. Journal of Information Science, 33(2), 163–180. Russell, B. (1903). Principles of mathematics. Cambridge, UK: Cambridge University Press. Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Edward Elgar Pub. Schwarz, N., Knäuper, B., Hippler, H.-J., Noelle-Neumann, E., & Clark, L. (1991). Rating scales: Numeric values may change the meaning of scale labels. Public Opinion Quarterly, 55(4), 570–582. Shaw, M. E., & Costanzo, P. R. (1982). Theories of social psychology (2nd ed.). McGraw-Hill.
References Shevlin, M., Miles, J. N. V., Davies, M. N. O., & Walker, S. (2000). Coefficient alpha: A useful indicator of reliability? Personality and Individual Differences, 28, 229–237. Sud-on, P., Abareshi, A., Pittayachawan, S., & Teo, L. (2013). Manufacturing agility: Construct and instrument development (pp. 754–762). Presented at the International Conference on Supply Chain and Logistics Management, Osaka, Japan. Velez, P. (1993). The Neutral response on attitudinal measures: An attribute of the item. San Jose State University. Velez, P., & Ashworth, S. D. (2007). The impact of item readability on the endorsement of the midpoint response in surveys. Survey Research Methods, 1(2), 69–74. Westland, J. C. (2010). Lower bounds on sample size in structural equation modeling. Electronic Commerce Research and Applications, 9, 476–487. Wildt, A. R., & Mazis, M. B. (1978). Determinants of scale response: Label versus position. Journal of Marketing Research, 15, 261–267. Wright, S. (1923). The theory of path coefficients: A reply to Niles's criticism. Genetics, 8(3), 239–255. Xu, W., Hung, Y. S., Niranjan, M., & Shen, M. (2010). Asymptotic mean and variance of Gini correlation for bivariate normal samples. IEEE Transactions on Signal Processing, 58(2), 522–534. Yu, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. PhD, University of California, Los Angeles, Los Angeles. Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99(3), 432–442.