Replace this box with a picture? Just click : Insert Picture – from file Locate your image Click – insert Position picture over box Crop/scale etc. Select picture, hold down shift key and click on white background then Click Draw –rder – send to back The top of your picture should be hidden by the top shape. Teaching Statistics using the Computer Emlyn Williams CSIRO Australia
Statistics students in Australia Decline in statistics student numbers in Australia During 2003, 3000 PhDs in Australia Only 186 PhDs in Mathematical and Statistical Sciences Why?
Possible reasons for the decline Popularity of Computing Science Reduced capacity of our Universities to train Maths / Stats professionals Type of Statistics being taught in Secondary Schools –Distribution theory –Probability –Equations / formulas
Some possible directions Data mining –Microarrays –Normalization –Multivariate Significance testing –Model selection –Resampling –Permutation tests Computer-based techniques
Understanding variation “..the central problem in management and leadership… is failure to understand the information in variation” Dr W. Edwards Deming Concept can be grasped without emphasizing mathematics or formulas Hands-on experiments Book “Statistical Thinking for Managers” by J.A. John, D. Whitaker and D.G. Johnson
Classroom experiments Beads experiment –Many white and red beads – majority white –Samples of 50 taken –Plot the number of red beads over time Quincunx experiment –Simulates a process to produce tubing with 50mm diameter –The process involves several steps –An operator is employed to monitor the process
Quincunx board
One sequence of 25 balls mean=49.6 sd=1.5
Tampering Method 2 – Process adjustment. The operator tries to compensate for the results of the previous sample Method 3 – Variability reduction. The operator adjusts to try and achieve the same result as the previous sample
Means of 50 balls for 30 sequences: Methods 1 and 3 (Method 2=50.0)
Standard deviations of 50 balls for 30 sequences: Methods 1,2 and 3
Analysis of Designed Experiments Replicate Seedlot Tree 4 X X X X X X X X 3 X X X X X X X X 1 5 X X X X X X X X 2 X X X X X X X X 1 X X X X X X X X 4 X X X X X X X X 1 X X X X X X X X 2 3 X X X X X X X X 5 X X X X X X X X 2 X X X X X X X X Seedlots 1 Acacia 2 Angophora 3 Casuarina 4 Melaleuca 5 Petalostigma
Analysis of Variance Table Source of variation d.f. s.s. m.s. v.r. F pr. repl seedlot <.001 Residual Total ***** Tables of means ***** Grand mean 6.12 seedlot Acacia Angophora Casuarina Melaleuca Petalostigma
Correct Analysis of Variance Table Source of variation d.f. s.s. m.s. v.r. F pr. repl stratum repl.plot stratum seedlot Residual repl.plot.tree stratum Total ***** Tables of means ***** Grand mean 6.12 seedlot Acacia Angophora Casuarina Melaleuca Petalostigma
A B Treatment Technical Replicate Dye Array
Treatment Biological Replicate Technical Replicate Dye Array ABA B
Opening screen of DataPlus Disk driveWorking directory displayed in the status bar Top bar menu Experimental Title which must be filled Path of working directory. Directory structure File display area File selection type Button to go to the next screen Status bar To create new sub-directory
Step-by-step instruction Choose you experiment design from the list Click the Next button
Step-by-step instruction Type in the numbers of replicates, plots and trees Click the Next button
Treatment screen Note: plots stratum Treatment Levels: to Input treatment names Treatment Layout: to Input the treatment layout
Measurement screen New Spreadsheet: for entering your measurement using Microsoft Excel Open Data File: for opening existing data file Derived variate: for declaring new variates (not measured in the field) e.g volume, basal area Note: trees stratum
Output Summary screen Output: for generating summary file View: to view summary data file using Notepad Select Tree: for selecting trees in inner plot Modify: to modify data file GenStat or SAS: to go to GenStat or SAS screen Note: plots stratum
Design of Experiments Designs mainly used to be constructed using combinatorics or group theory The class of Partially Balanced Incomplete Block designs was defined and developed These designs did not always focus on quantities of importance to practitioners We need to maximize the amount of treatment information in the lowest stratum (where we have most precision) The average efficiency factor does this and can be used as an objective function in a computer search algorithm
Two possible arrangements for an incomplete block design with 9 treatments Replicate 1 Replicate 2 Block ____________ ___________ Replicate 1 Replicate 2 Block ___________ ___________
Software - CycDesigN Windows 95 to XP Visual C++ Resolvable / non-resolvable Block / row-column One / two stage Cyclic / alpha / other Factorial / nested treatments t-Latinized / partially-latinized Unequal block sizes
Latinized row-column design for 20 treatments
Summary In Australia (and probably elsewhere) we need to change some of the teaching practice and content in secondary schools and universities in order to prevent a continuing decline in the number of statistics students Development of and education using computer-based statistical techniques may provide an attractive addition to existing curricula