Presentation is loading. Please wait.

Presentation is loading. Please wait.

What are Statistical E-Books? Professor William Browne Centre for Multilevel Modelling, University of Bristol.

Similar presentations


Presentation on theme: "What are Statistical E-Books? Professor William Browne Centre for Multilevel Modelling, University of Bristol."— Presentation transcript:

1 What are Statistical E-Books? Professor William Browne Centre for Multilevel Modelling, University of Bristol

2 Acknowledgements Jon Rasbash for his vision on which Stat-JR was built. All the team behind programming the Stat-JR software – Chris, Danius, Luc, ZZ, Richard, Camille, Bruce, Huanjia and Alex and collaborators on the e-Stat project. Harvey Goldstein, Fiona Steele, George Leckie, Chris Charlton, Kelvyn Jones and all my other CMM colleagues past and present. ESRC for much funding over the years and the British Academy for funding the current Teaching eBooks grant.

3 What will we cover ? eBooks and Statistical eBooks STAT-JR eBooks Workflows Example eBooks and workflows Other software – iPython/Jupyter and R-Shiny

4 eBooks + = An electronic book is a book- publication in digital form. In the US more books are published online than distributed in hard copy in book shops.

5 Advantages of the eBook format For younger members of the audience it will seem strange to think that going back a few decades academics had to go to the library to access journal articles – often photocopying them or even having to fill in inter-library loan forms! Nowadays one can (subject to subscription) have thousands of articles and books available at one’s finger tips, often as pdf files. Accessing documents within a web-browser allows for enhanced documents with additional features e.g. by embedding sound files, videos within the document. The medium is no longer simply paper!

6 Statistical (and Mathematical) eBooks The idea is can we incorporate statistical content into an eBook? Of course a statistical textbook is no different on paper to any other document when it comes to creating a pdf file (aside from maybe more equations!) The difference is in what ‘enhancements’ we can add and so the idea here is combining the text book with the statistics package i.e. interactive examples, allowing the user to include their own dataset etc.

7 Mathematica and SAGEMath Early adopters of eBook technology in their work – much work on mathematical examples but some coverage of statistics. Idea of an interactive log book. Use of sliders as shown below. Influenced our first grant application for eBook work.

8 Stat-JR A statistical package developed by the team at the centre for multilevel modelling with colleagues at Southampton. Contains it’s own (MCMC-based) estimation engine. System based on the idea of a suite of templates where each template performs a specific operation. Also allows interoperability with other software packages, so for example might have a regression template that fits regressions using various software packages. The initial TREE interface runs in a web browser. There are also newer eBook and workflow interfaces. Several ESRC grants have enabled Stat-JR to be written.

9 An example of STAT-JR – setting up a model

10 Example of STAT-JR – setting up a model

11 Example TREE interface screen shot 2 All objects created available from one pull down and can be popped out to separate tabs in browser.

12 Output from the E-STAT engine Estimates and the DIC diagnostic can be viewed for the model fitted.

13 Using the eBook interface in Stat-JR

14 Different forms of STAT-JR and E-books TREE (Template Reading and Execution Environment) - the format we have demonstrated up to now. Allows user to investigate 1 template and 1 dataset. A dataset can be output from 1 template and then used by the next. DEEP (Documents with Embedded Execution and Provenance) – mixing up templates with textboxes to make executable books. LEAF (Logging and Execution of Analysis Flows) – another workflow based interface to allow the joining together of sets of templates (see later).

15 Stat-JR writes commands, etc., to perform requested function Template Dataset Stat-JR prompts user for input Function performed (If applicable) external software opened, run, then closed, with results returned to Stat-JR. E.g… Results of function produced (If applicable) results outputted as dataset to be fed back in… myModel<- glm(normexam~ Summary(myModel) plot(myModel,1) Select Open Worksheet Select datafile.dta Select Equations from Fi EquationsMacrosScriptsPoint & click instructions Results Model: DIC: 9766.506 Parameters: Beta1: 0.594 Charts Results tables

16 …so we first import one… No eBooks loaded yet…

17

18

19

20

21

22 Navigate through pages of eBook Hierarchical table of contents (can be expanded / collapsed at each node)

23

24

25

26

27

28

29 Behind the scenes… The eBook author (Richard) has specified which Stat-JR template to associate with this region of the eBook… …and has chosen one which creates plots via R (“PlotsViaR”). Templates require input, from a user, before they can go ahead & perform the function appropriately… …the eBook author can pre-specify inputs (by writing them into the eBook code); any that are not pre-specified are then left to the eBook reader to complete.

30 Stat-JR writes commands, etc., to perform requested function Template Dataset Stat-JR prompts user for input Function performed (If applicable) external software opened, run, then closed, with results returned to Stat-JR. Results of function produced (If applicable) results outputted as dataset to be fed back in… myModel<- glm(normexam~ Summary(myModel) plot(myModel,1) Select Open Worksheet Select datafile.dta Select Equations from Fi EquationsMacrosScriptsPoint & click instructions Results Model: DIC: 9766.506 Parameters: Beta1: 0.594 Charts Results tables Stat-JR: to re-cap…

31

32

33

34

35

36

37 Behind the scenes… …the eBook author has associated relevant model-fitting Stat-JR templates with this region of the eBook… …and has pre-specified all of the inputs, bar the explanatory variables, which are therefore the only ones left to eBook reader to specify. Author has also specified what / where / when the output resulting from a template’s execution will be presented in the eBook…

38 Stat-JR writes commands, etc., to perform requested function Template Dataset Stat-JR prompts user for input Function performed (If applicable) external software opened, run, then closed, with results returned to Stat-JR. Results of function produced (If applicable) results outputted as dataset to be fed back in… myModel<- glm(normexam~ Summary(myModel) plot(myModel,1) Select Open Worksheet Select datafile.dta Select Equations from Fi EquationsMacrosScriptsPoint & click instructions Results Model: DIC: 9766.506 Parameters: Beta1: 0.594 Charts Results tables Stat-JR: to re-cap…

39

40

41

42

43 Content of text returned is conditional on value of results

44 Stat-JR’s DEEP system: Summary of features Built on Stat-JR’s powerful & flexible data- analytical engine. Embeds inputs and outputs of Stat-JR’s executable statistical functions within contextual information. Tailoring & specificity: e.g. associating carefully- chosen templates; pre-specifying inputs. Log / recording tool: behind-the-scenes, a comprehensive record is kept of each execution.

45 Stat-JR’s DEEP system: Teaching of quantitative research methods (including inter-operating software). Communicating principles / theories / inviting exploration of quantitative research topics. Reports: transparency (e.g. access to embedded dataset / analytical methods, etc); facilitates multi-authored preparation. Tailored analytical techniques: pre-specifications allow user to ‘cut to the chase’ and/or circumvent software-specific learning curve. Benefits for the researcher?

46 Current ESRC grant Grant funds Richard Parker in Bristol and Danius Michaelides in Southampton for 3 years Contains 5 work packages: 1.Capturing discipline-specific research in eBooks 2.Capturing methodological decisions in eBooks 3.The statistical analysis assistant - SAA 4.Reproducible Research and the enhanced journal article 5.The use of eBooks in research training and an online eBook repository

47 Work Package 3 – The SAA We will adapt our eBook system to allow workflows that will be constructed to describe how the steps in a statistical analysis fit together. There may be many SAAs adapted to different researcher’s approaches - for example one might want to answer a research question/analyse a dataset as a specific expert might do it. Opinion is divided on how far one can take the idea – from nowhere to complete automation i.e. pour in the dataset at the top and let the computer sort it out. Probable end point will be somewhere in between or in fact a series of SAAs that lie on this continuum

48 A statistical analysis assistant we are all happy with!

49 One Step further

50

51

52

53 ‘The Warlock of Firetop Mountain’ approach The first of a genre of interactive books published in 1982 and lapped up by 10 year old boys like myself! A combination of book and flowchart Worked something like: ‘The goblin advances towards you, shouting words that you can’t understand, do you try to make conversation (turn to page 231), run past the goblin (turn to page 176) or draw your sword and fight (turn to page 134)’ Basically underpinning the book was effectively a flowchart disguised by random page movements with a variety of endings (99% of them involved you dying), possible loops etc.

54 The use of Flowcharts in Statistics The equivalent exists in (at least) basic statistical analysis and a variety of books have flowcharts to guide the uninitiated to the appropriate test. The branching rules are usually things like – how many variables do you have?, what type are they?, is a normality assumption appropriate? The example flowcharts usually then say you need a t test / Mann Whitney test / ANOVA etc. One could expand this idea to include branches where we haven’t written material – i.e. the equivalent of ending up dead would be the default ‘go and ask a statistician’ end point – possibly taking your answers to the flow chart with you.

55 Where might this go? The flow chart idea is appealing as it may to some degree mimic a statistical consultation. If the system is flexible enough then each statistician can tune the SAA to their own approach to analysis and to how much they feel can be comfortably automated. Where there is uncertainty / options in what one should do this could be incorporated E-books can contain hyperlinks so that further background on proposed statistical methods or examples can be easily found

56 Stat-JR’s LEAF Work flow system Based around a new front end written using the Blockly system. Allows the user to link up templates themselves in a user-friendly visual way. Work flows can be included in eBooks. Will give a few examples of simple workflows.

57 Blockly

58 Histogram workflow (hist.xml) Here is a log style workflow.

59 Histogram workflow (hist.xml) Removing one of the inputs would then make this a question that the user needs to input.

60 Histogram Output

61 More complex example

62 Output

63 Asking for inputs Here red blocks are variables. Here the blocks have been collapsed.

64 Conditional operations To the left is part of a workflow showing some conditional operations. Depending on the distribution the prediction we wish to plot will be converted to the probability scale for Binomial or to the counts scale for Poisson or left as is for Normal responses.

65 Output Here you can see prediction lines for a model on the probability scale. There is 1 line for each district and the probability (of using contraceptive) changes with age.

66 Other Systems - Shiny by RStudio

67 Communicating Uncertainty work Project webpage by colleagues using R-Shiny – see https://www.cmm.bris.ac.uk/interactive/uncertainty/ https://www.cmm.bris.ac.uk/interactive/uncertainty/

68 Communicating Uncertainty work Note the use of the sliders embedded in the webpage

69 Other systems 2 – iPython Jupyter notebooks

70 New release of Stat-JR The latest release of the Stat-JR software with a first beta release of the LEAF workflow interface is available at http://www.bristol.ac.uk/cmm/software/statjr/d ownloads/ Any questions?


Download ppt "What are Statistical E-Books? Professor William Browne Centre for Multilevel Modelling, University of Bristol."

Similar presentations


Ads by Google