Mkmss A manuscript copying simulation. Underlying model This copying simulation is based on a geographical model where “Places” (= population centres)

Slides:



Advertisements
Similar presentations
Chapter 7: Sampling Distributions
Advertisements

G. Alonso, D. Kossmann Systems Group
Chapter 8 Estimating Single Population Parameters
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
LSP 120: Quantitative Reasoning and Technological Literacy Section 118
BSc/HND IETM Week 9/10 - Some Probability Distributions.
Chapter 7 Sampling Distributions
LSP 120: Quantitative Reasoning and Technological Literacy Section 903 Özlem Elgün.
Chapter 7 Sampling and Sampling Distributions
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Parameters and Statistics Probabilities The Binomial Probability Test.
Chapter Sampling Distributions and Hypothesis Testing.
LSP 120: Quantitative Reasoning and Technological Literacy
Control Charts for Attributes
How populations grow How do ecologists study population ?
Nonparametric or Distribution-free Tests
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
The Marriage Problem Finding an Optimal Stopping Procedure.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Population Ecology 4 CHAPTER
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Choosing Your Test Spearman’s? Chi-squared? Mann-Whitney?
Section Using Simulation to Estimate Probabilities Objectives: 1.Learn to design and interpret simulations of probabilistic situations.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
1 Chapter Two: Sampling Methods §know the reasons of sampling §use the table of random numbers §perform Simple Random, Systematic, Stratified, Cluster,
A Phylogenetic Approach to New Testament Textual Criticism Stephen C. Carlson Australian Catholic University June 30, 2015.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Discrete Probability Distribution Henry Mesa Use your keyboard’s arrow keys to move the slides forward (▬►) or backward (◄▬) Hit the Esc key to end show.
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
Computer Simulation. The Essence of Computer Simulation A stochastic system is a system that evolves over time according to one or more probability distributions.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Population Ecology- Continued
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 7 Sampling Distributions 7.1 What Is A Sampling.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
STROUD Worked examples and exercises are in the text Programme 10: Sequences PROGRAMME 10 SEQUENCES.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
1 Estimation Chapter Introduction Statistical inference is the process by which we acquire information about populations from samples. There are.
Chapter 6 Sampling and Sampling Distributions
Chapter 14 Genetic Algorithms.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
FW364 Ecological Problem Solving Class 15: Stage Structure
Probability Probability underlies statistical inference - the drawing of conclusions from a sample of data. If samples are drawn at random, their characteristics.
Random Variables Binomial Distributions
Chapter 8: Estimating with Confidence
Populations Interdependence in Nature
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

mkmss A manuscript copying simulation

Underlying model This copying simulation is based on a geographical model where “Places” (= population centres) create demand for texts. There are four places in the current version of the simulation: Rome, Ephesus, Antioch, and Alexandria. The simulation runs through a number of cycles (called “generations”). Each cycle consists of a number of steps: (1) import copies from other places; (2) make copies from local copies; (3) edit local copies according to local preferences; (4) lose copies; (5) grow demand for copies (using logistic growth). Some copies are recovered at the end. Recovered copies are taken from ones that survive until the end of the cycles (extant copies) and ones lost during the cycles (lost copies).

A model data set Examples of a real data sets can be found here: We will aim to find a combination of simulator settings to produce analysis results like those obtained from UBS4 apparatus data for the Gospel of Mark: CMDS: UBS4.15.SMD.gif DC: UBS4.15.SMD.png NJ: UBS4.15.SMD.png

Start the simulation The simulation can be temporarily accessed here: Another way to access it is by installing RStudio's Shiny on your machine then downloading the files located here: mss/

Characters/text A character is a place where the text varies. (In NT textual research, a character is called a “variant phrase”.) A state is one of the textual manifestations (of zero or more words) encountered at a character. (In NT textual research, a state is called a “reading”.) A character can have two or more states. This simulation uses a negative binomial distribution to decide how many states are contained in each character. (The distribution is calibrated to behave similarly to the UBS4 apparatus with respect to the number of states per character.) Experiment with this slider to see the effect of different numbers of characters per artificial text. As with a number of other sliders, larger values make the simulation work harder and therefore take longer to complete.

Generations/simulation How many cycles are completed. With a growth factor of r = 1, it takes about eight generations for the population of copies to get near its maximum size. Larger values make the simulation work much harder.

P(import) / unit (of demand) For every copy required by a Place, what is the chance of importing one from another Place? Copies are more likely to be imported from nearer places according to the Zipf distribution (P = k / rank^s) using s=1. Increasing P(import) makes more “cross-talk” between Places. (Look for the word “taken” under the “Recovered texts” tab.)

P(change) / character For every character copied, what is the chance that it will change to another one of the possible states? (Remember that a number of states is assigned to each character at the start of the simulation.) What do you think is a realistic proportion in this situation? That is, what is the chance that a scribe would change the state of a UBS-like character in the course of making one copy. The UBS apparatus has about 140 characters (i.e. variant phrases) for the Gospel of Mark. (About 9 per chapter.) Try different values to see the effect. Large values produce the characteristic pattern seen with unrelated texts – i.e. those which behave the same way with respect to each other as texts comprised of randomly chosen states. In an NJ diagram, unrelated texts produce a diagram that looks like the spokes of a bicycle wheel.

P(correction) / generation What is the chance that a copy will be corrected against another copy from the same Place? Copies to use as the exemplar (i.e. used as a source of states for the copy being corrected) are chosen according to their rank in the Place's extant (i.e. not lost) collection. A Zipf distribution (with s=1) is used here too. New additions (whether imported or copied) are added to the end of the extant collection so they are less likely to be used as exemplars until the ones ranked above them are lost. What is a reasonable value for this? (Just about every real copy was corrected at least once.)

P(edition) / generation This introduces the idea of local preference for particular states. Were some readings preferred more than others in a place? At the start of the simulation, each place is assigned its own list of preferred states for each character. The preferred states are a permutation of the list of states. E.g., for a character with three states, and are two permutations. As with just about every other thing in this simulation, the permutations are randomly generated. What kind of effect would you expect local preferences to have? (Try the slider and see.)

Trend (toward preferred text) The strength of preference for the higher ranked states is determined by the trend slider. This sets the exponent of the Zipf distribution used to choose from the list of preferences. A value of one gives relative probabilities of 1, 0.5, 0.33, 0.25, … whereas a value of five gives relative probabilities of 1, 0.031, , , … The higher the value, the more pronounced is preference for the local flavour of a text. (A value of five or more could be described as pathological parochialism.)

P(loss) / generation What is the chance that a copy will be lost in a generation? A value of 0.5 means that only one in 2^n survives n generations. E.g., for 10 generations, the chance of survival would be one in What would you guess is the per generation chance of survival for a real New Testament manuscript? (Assume that one generation is 25 years.) This slider acts as a reference for other “per generation” sliders. E.g. if P(correction) is twice P(loss) then each copy has an expectation value of two correction events in its life time.

Growth / generation (logistic) The growth is calculated as follows: ΔN = Ν * r * (1 – Ν/Κ) This slider sets r. (K is set inside the program.) A value of r = 1 makes the demand double every generation at the beginning. The rate of increase decrease to zero as N/K approaches one.

Lost / extant (for recovery) What is the relative probability that the recovery phase will retrieve lost copies instead of extant ones? Lost copies tend to be older, but there are few really old ones. The simulation is set to recover 60 / 1200 copies: only 5%. What do you think is the real ratio of lost (e.g. ones found in rubbish heaps, burials, caves) to extant copies (e.g. ones that have survived in monasteries, libraries or museums) for New Testament manuscripts?

Seed (for random numbers) This lets you try the same slider settings with a different set of random numbers. The program uses (pseudo) random processes extensively. Setting a seed let's one reproducibly produce the same set of pseudo random numbers(!) Try different seed values to see the effect.

Mission Your mission is to try to produce a set of settings which produce analysis results which “look like” results obtained from a real data set (Mark UBS4).