Scientific Creativity as a Combinatorial Process The Chance Baseline.

Slides:



Advertisements
Similar presentations
Correlational and Differential Research
Advertisements

Aging and Creative Productivity Is There an Age Decrement or Not?
COMM 472: Quantitative Analysis of Financial Decisions
Correlation and regression
Hypothesis Testing Steps in Hypothesis Testing:
Copyright © Allyn & Bacon (2007) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Ch11 Curve Fitting Dr. Deshi Ye
Objectives (BPS chapter 24)
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Multiple Regression [ Cross-Sectional Data ]
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
First, Best, and Last: Creative Landmarks Across the Career Course.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Chapter 12 Multiple Regression
Chapter 4 Multiple Regression.
Scientific Creativity, Logic, and Chance: The Integration of Product, Person, and Process Research Traditions.
Chapter Topics Types of Regression Models
Creativity in Science: Dispositional and Developmental Correlates.
Economics 20 - Prof. Anderson
Inferences About Process Quality
Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).
Today Concepts underlying inferential statistics
Autocorrelation Lecture 18 Lecture 18.
Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Objectives of Multiple Regression
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Section 2: Science as a Process
Inference for regression - Simple linear regression
Regression Method.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Discussion of time series and panel models
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Correlation & Regression Analysis
Outline of Today’s Discussion 1.Introduction to Correlation 2.An Alternative Formula for the Correlation Coefficient 3.Coefficient of Determination.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Chapter 1 continued.  Observation- something noted with one of the five senses.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
BPS - 5th Ed. Chapter 231 Inference for Regression.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 10: The t Test For Two Independent Samples.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Stats Methods at IC Lecture 3: Regression.
The simple linear regression model and parameter estimation
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Undergraduated Econometrics
I. Statistical Tests: Why do we use them? What do they involve?
Basic Practice of Statistics - 3rd Edition Inference for Regression
Chapter 7: The Normality Assumption and Inference with OLS
Product moment correlation
The Examination of Residuals
Presentation transcript:

Scientific Creativity as a Combinatorial Process The Chance Baseline

Goal n Formulate a theory of scientific creativity –that uses Parsimonious assumptions and Logical derivations –to obtain Comprehensive explanations and Precise predictions –with respect to the most secure empirical results n In other words, getting the most with the least

Argument: Part One n Combinatorial models –currently get the most with the least relative to any alternative theory. That is, such models make the fewest assumptions, and by logical inferences explain the widest range of established facts and make the most precise predictions with respect to those data

Argument: Part Two n Even if combinatorial models are incomplete from the standpoint of one or more criteria, n such models must still provide the baseline for comparing all alternative theories. n That is, rival theories must account for whatever cannot be accounted for by chance alone, or what exceeds the chance baseline (cf. “null” hypothesis; research on the “hot hand” or parapsychology; etc.)

Creativity in Science: Two Critical Research Sites n Scientific Careers: –Publications n Scientific Communities: –Multiples

Publications n Individual Variation n Longitudinal Change

Individual Variation n Skewed Cross-sectional Distribution

Individual Variation n Skewed Cross-sectional Distribution –10%  50% / 50%  15%

Individual Variation n Skewed Cross-sectional Distribution –Lotka’s Law

Individual Variation n Skewed Cross-sectional Distribution –Lotka’s Law: f (T) = k T – 2 or log f (T) = log k - 2 log T

Individual Variation n Skewed Cross-sectional Distribution –Lotka’s Law: f (T) = k T – 2 or log f (T) = log k - 2 log T where T is total lifetime output

Individual Variation n Skewed Cross-sectional Distribution –Lotka’s Law: –Price’s Law: N 1/2 → 50% of total field output

Individual Variation n Skewed Cross-sectional Distribution –Lotka’s Law: –Price’s Law: N 1/2 → 50% of total field output where N is size of field

Individual Variation n Skewed Cross-sectional Distribution n Quantity-Quality Relation

Individual Variation n Skewed Cross-sectional Distribution n Quantity-Quality Relation –Equal-Odds Baseline: H i =  1 T i + u i

Individual Variation n Skewed Cross-sectional Distribution n Quantity-Quality Relation –Equal-Odds Baseline: H i =  1 T i + u i –where  1 is the overall “hit rate” (0 <  1 < 1) for individuals in a given domain

Individual Variation n Skewed Cross-sectional Distribution n Quantity-Quality Relation –Equal-Odds Baseline: H i =  1 T i + u i –where  1 is the overall “hit rate” (0 <  1 < 1) for individuals in a given domain, –H i is the number of “hits” (e.g., high-impact publications) for individual i, and –the random shock 0 ≤ u i ≤ T i (1 -  1 )

Individual Variation n Skewed Cross-sectional Distribution n Quantity-Quality Relation –Equal-Odds Baseline: H i =  1 T i + u i –where  1 is the overall “hit rate” (0 <  1 < 1) for individuals in a given domain, –H i is the number of “hits” (e.g., high-impact publications) for individual I, and –the random shock 0 ≤ u i ≤ T i (1 -  1 ) –N.B.: If  1 were a linear function of T i, then the overall function would be quadratic, not linear

Longitudinal Change n Randomness of Annual Output –No “runs” –Poisson Distribution P (j) =  j e -  / j! e =  and j! = 1  2  3    j

Longitudinal Change n Randomness of Annual Output n Quantity-Quality Relation –Random Fluctuation around a Quality Ratio Baseline –Hence, the Equal-Odds Baseline: –H it =  2 T it + u it (  2 =  1 if estimated from the same cross-sectional sample) –for the ith individual in career year t, – and where 0 ≤ u it ≤ T it (1 -  2 )

Multiples n Distribution of Multiple Grades

Multiples n Distribution of Multiple Grades n Temporal Separation of Multiple Discoveries

Multiples n Distribution of Multiple Grades n Temporal Separation of Multiple Discoveries n Individual Variation in Multiple Participation

Multiples n Distribution of Multiple Grades n Temporal Separation of Multiple Discoveries n Individual Variation in Multiple Participation n Degree of Multiple Identity

Combinatorial Processes n Definitions n Assumptions n Implications n Elaboration n Integration

Definitions n Individual n Domain n Field

Assumptions n Individual Samples from Domain Ideas

Assumptions n Individual Samples from Domain Ideas –Assume samples random or quasi-random

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size –Postulate a normal distribution

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size n Quasi-Random Combination of Ideas

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size n Quasi-Random Combination of Ideas –Variable degrees of constraint depending on nature of domain Scientific revolutionaries vs. normal scientists Paradigmatic vs. nonparadigmatic scientists Scientists vs. artists

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size n Quasi-Random Combination of Ideas n Variation in Quality of Combinations

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size n Quasi-Random Combination of Ideas n Variation in Quality of Combinations –Differential fitness with respect to scientific criteria (facts, logic, etc.) –Small proportion publishable, an even smaller proportion high impact

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size n Quasi-Random Combination of Ideas n Variation in Quality of Combinations n Variation in Size of Fields

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size n Quasi-Random Combination of Ideas n Variation in Quality of Combinations n Variation in Size of Fields n Communication of Ideational Combinations

Assumptions n Individual Samples from Domain Ideas n Within-Field Variation in Sample Size n Quasi-Random Combination of Ideas n Variation in Quality of Combinations n Variation in Size of Fields n Communication of Ideational Combinations –If accepted, then incorporation into the domain pool, completing the cycle

DOMAIN FIELD INDIVIDUAL Sampling Communication Incorporation

Communication-Incorporation: n Rate increases with speed of –Communication practices (journals vs. books; least-publishable units) –Gate-keeping procedures (peer review; editorial policies) –Publication lags (1st- vs. 2nd-tier journals) –Diffusion to secondary sources (introductory texts, popularizations, etc.) n Hence, variation across time and discipline

Implications n Research Publications –Cross-sectional Variation –Longitudinal Change

Implications n Multiple Discoveries –Multiple Grades

Implications n Multiple Discoveries –Multiple Grades Variation across time and discipline –Temporal Separation Variation across time and discipline

Implications n Multiple Discoveries –Multiple Grades –Temporal Separation –Multiples Participation

Implications n Multiple Discoveries –Multiple Grades –Temporal Separation –Multiples Participation Number of ideational combinations Number of overlapping domain samples

Implications n Multiple Discoveries –Multiple Grades –Temporal Separation –Multiples Participation –Multiple Identity

Elaboration n Aggregated Data on Career Output –Aggregated Across Time Units –Aggregated Across Scientists n Cognitive Combinatorial Model –Two-step process Ideation generates combinations Elaboration generates communications –Individual differences in Domain sample Career onset

n p (t) = abm(b – a) -1 (e – at – e – bt ) –where p (t) is ideational output at career age t (in years), –e is the exponential constant (~ 2.718), –a the typical ideation rate for the domain (0 < a < 1), –b the typical elaboration rate for the domain (0 < b < 1), –m the individual’s creative potential (i.e. maximum number of ideational combinations in indefinite lifetime). n If a = b, then p (t) = a 2 mte – at n Number of communications T it is proportional to p. n Individual differences in –Creative potential (m) –Age at career onset (i.e., chronological age at t = 0)

Implications n Typical Career Trajectories

N.B. n The above curve has been shown to correlate in the mid- to upper-.90s for numerous data sets in which output information has been aggregated across many individual careers n Yet even in the case of highly productive individuals, the predicted curve does reasonably well

e.g., the career of Thomas Edison C Edison (t) = 2595(e -.044t - e -.058t ) r =.74

Implications n Typical Career Trajectories n Individual Differences in Trajectories –Fourfold Typology High versus Low Creative Potential Early versus Late Age at Career Onset

Chronological Age Creative Productivity Chronological Age Creative Productivity Chronological Age Creative Productivity Chronological Age Creative Productivity High Creative Early BloomersLow Creative Early Bloomers High Creative Late BloomersLow Creative Late Bloomers

Specific Prediction n Individual differences in output across consecutive age periods (5- or 10-year units) for scientists with same age at career onset yields a specific pattern of correlations across those units, namely one most consistent with –a single-factor model, rather than –an autoregressive (simplex or quasi- simplex) model.

20s30s40s50s 60s 20s30s40s50s60s m Single-Factor Model Autoregressive Model

Former single-factor model already confirmed on distinct data sets (e.g., there is no tendency for the correlations between two age periods to decline as a function of the temporal separation between the two periods; i.e., no decline with distance from matrix diagonal)

Implications n Typical Career Trajectories n Individual Differences in Trajectories n Domain Variation in Trajectories

Peak Age DomainabCareerChrono -logical Half- life Chemists Biologists Geologists Estimates for Three Disciplines

Implications n Typical Career Trajectories n Individual Differences in Trajectories n Domain Variation in Trajectories n Placement of Career Landmarks –Across domains

Implications n Typical Career Trajectories n Individual Differences in Trajectories n Domain Variation in Trajectories n Placement of Career Landmarks –Across domains –Across individuals

Chronological Age Creative Productivity Chronological Age Creative Productivity Chronological Age Creative Productivity Chronological Age Creative Productivity High Creative Early BloomersLow Creative Early Bloomers High Creative Late BloomersLow Creative Late Bloomers F B L

Specific Predictions n Given the above, it is possible to derive predictions regarding the pattern of correlations among –the ages of the three career landmarks (F, B, L), –the age at maximum output rate (x), –final lifetime productivity (T), –the maximum output rate (X), and –the time lapse or delay (d) between career onset and first career landmark (i.e., preparation period) n In particular …

Specific Predictions n 1A: Total lifetime productivity correlates –negatively with the chronological age of the first contribution (r TF < 0) and –positively with the chronological age of the last contribution (r TL > 0).

Specific Predictions n 1B: Maximum output rate correlates –negatively with the chronological age of the first contribution (r XF < 0) and –positively with the chronological age of the last contribution (r XL > 0).

Specific Predictions n 2A: Total lifetime productivity correlates –zero with the chronological age at the maximum output rate (r Tx = 0) and –zero with the chronological age at the best contribution (r TB = 0).

Specific Predictions n 2B: Maximum output rate correlates –zero with the chronological age at the maximum output rate (r Xx = 0) and –zero with the chronological age at the best contribution (r XB = 0).

Specific Predictions n 3A: The chronological age at the maximum output rate correlates positively with both –the chronological age at the first contribution (r xF > 0) and –the chronological age at the last contribution (r xL > 0).

Specific Predictions n 3B: The chronological age of the best contribution correlates positively with both –the chronological age at the first contribution (r FB > 0) and –the chronological age at the last contribution (r BL > 0).

Specific Predictions n 4: The first-order partial correlation between the ages of first and last contribution is negative after partialling out either –the chronological age at the best contribution (r FL.B = r FL – r FB r LB < 0) or –the chronological age at the maximum output rate (r FL.x = r FL – r Fx r Lx < 0)

Specific Predictions n 5: The time interval between the chronological age at career onset and the chronological age at first contribution is negatively correlated with both –total lifetime productivity (r Td < 0) and –the maximum output rate (r Xd < 0).

Discussion n Foregoing predictions unique to the combinatorial model –That is, they cannot be generated by alternative theories (e.g., cumulative advantage, human capital) n Furthermore, all predictions have been confirmed on several independent data sets

Discussion n Moreover, if we assume that eminence (E) is highly correlated with lifetime productivity (r ET >> 0), then we obtain additional predictions: n Eminence correlates –negatively with the age of the first contribution (r EF < 0), –positively with the age of the last contribution (r EL > 0), –zero with the age at the maximum output rate (r Ex = 0), –zero with the age at the best contribution (r EB = 0), and –negatively with the time interval between the age at career onset and the age at first contribution (r Ed < 0) n These predictions also empirically confirmed

Integration: Combinatorial Process Emerges from … n Creative Scientists n Research Programs n Research Collaborations n Peer Review n Professional Activities n Individual-Field-Domain Effects –dI/dt =  IN

Conclusion n Because combinatorial models work so well with respect to scientific creativity n (and because they have been extended successfully to non-scientific creativity), n they seem to provide a valid baseline for gauging other explanations. n Hence the next question: What other processes or variables add an increment to the variance already explained by combinatorial models?