Workflow Analysis of Student Data John R. Gilbert and Viral Shah.

Slides:

Advertisements

Similar presentations

SADC Course in Statistics Revision using CAST (Session 04)

Advertisements

Overview of Inferential Statistics

CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 

Chapter 10: Estimating with Confidence

What Is a Sampling Distribution?

Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.

Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.

27-Jun-15 Profiling code, Timing Methods. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the.

Assignment 1 CMSC 714 September 11, Data Collection For this assignment, you will be asked to provide the following: –Effort spent on various.

Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.

Reduce Instrumentation Predictors Using Random Forests Presented By Bin Zhao Department of Computer Science University of Maryland May

Nonparametrics and goodness of fit Petter Mostad

Chapter Nine: Evaluating Results from Samples Review of concepts of testing a null hypothesis. Test statistic and its null distribution Type I and Type.

Chapter 7 Sampling Distributions

Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.

Parallel implementation of RAndom SAmple Consensus (RANSAC) Adarsh Kowdle.

Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith.

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.

Research Design. Research is based on Scientific Method Propose a hypothesis that is testable Objective observations are collected Results are analyzed.

RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.

CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Disciplined Software Engineering Lecture #4 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Sponsored by the U.S. Department.

Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.

Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

Which Language is Better?

What is Science? Science is a system of knowledge based on facts and principles.

20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.

AP Statistics Section 11.1 A Basics of Significance Tests

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.

Lab 3b: Distribution of the mean

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

Copyright © 1994 Carnegie Mellon University Disciplined Software Engineering - Lecture 1 1 Disciplined Software Engineering Lecture #4 Software Engineering.

6/3/2016 SCIENTIFIC METHOD PROCESSES OF SCIENTIFIC INQUIRY.

Section 10.3 Hypothesis Testing for Means (Large Samples) HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant.

Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.

Section 10.1 Confidence Intervals

CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

Reasoning in Psychology Using Statistics Psychology

Introduction to Loops For Loops. Motivation for Using Loops So far, everything we’ve done in MATLAB, you could probably do by hand: Mathematical operations.

Presenting and Analysing your Data CSCI 6620 Spring 2014 Thesis Projects: Chapter 10 CSCI 6620 Spring 2014 Thesis Projects: Chapter 10.

3d Pose Detection Used by Kinect

MCMC in practice Start collecting samples after the Markov chain has “mixed”. How do you know if a chain has mixed or not? In general, you can never “proof”

CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.

M Machine Learning F# and Accord.net.

Chapter 8: Confidence Intervals based on a Single Sample

Project and presentation preparation Applied Math 115 Practice your presentation, stay on schedule Correct organization: a brief introduction and motivation,

Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.

11 Debugging Programs Session Session Overview  Create and test a method to calculate percentages  Discover how to use Microsoft Visual Studio.

Scientific Inquiry Scientific Inquiry The Scientific Method Analyzing observations to solve problems.

Scientific Methods in Earth Science. What You Will Learn  Explain how scientists begin to learn about the natural world.  Explain what scientific methods.

Outline Announcements: –HW I key online this afternoon –HW II due Friday –Sign up to discuss projects Debugging Testging for correctness.

T tests comparing two means t tests comparing two means.

ILUTE A Tour-Based Mode Choice Model Incorporating Inter-Personal Interactions Within the Household Matthew J. Roorda Eric J. Miller UNIVERSITY OF TORONTO.

CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.

Reasoning in Psychology Using Statistics Psychology

18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.

Unit 5: Hypothesis Testing

Data Analysis in Particle Physics

1.2 System Design Basics.

Significance Tests: The Basics

Chapter 9: Sampling Distributions

Outline System architecture Current work Experiments Next Steps

Presentation transcript:

Workflow Analysis of Student Data John R. Gilbert and Viral Shah

Goals and Caveats Our goal is to develop methods for using measured data to evaluate hypotheses about workflow. These are preliminary experiments using data from one pilot classroom study (UCSB CS240A, spring 2004). Don’t put too much trust in this data! (We are still learning what data to gather and how.) Therefore, this talk won’t defend any particular conclusions about workflows. Rather, we aim to:  show that a data-based analytical approach is promising for further development  inform data-gathering in upcoming studies.

Background System asked student for a reason for each compile We didn’t trust the answers... But we captured full source etc. at each compile & run So, we completely re-ran the student experience  here, one assignment from one class  17 student histories  about one day of 32-processor cluster time Used heuristics to assign reasons for compiles …

Scripted questionnaire What is the reason for this compile/run?  Learn / experiment with compiler  Adding serial functionality  Parallelizing  Performance tuning  Fixing compile time error  Fixing run time error

Heuristics to deduce answers What is the reason for this compile/run?  Learn / experiment with compiler Few MPI calls, LOC ~unchanged  Adding serial functionality Few MPI calls, LOC changes  Parallelizing # of MPI calls changes  Performance tuning Correct, run time changes  Fixing compile time error Previous compile failed  Fixing run time error Previous run failed on random test case More than one reason may match.

Example of student trace Student #2: 70 runs in 7 days, 12 hr, 0 sec rev 1 after 0s: [ seq ] [ runerr ] rev 2 after 4d 4h 27m 7s: [ seq par ] [ cplerr ] rev 3 after 41s: [ seq ] [ cplerr ] rev 4 after 19s: [ ] [ cplerr ] rev 5 after 21s: [ ] [ cplerr ] rev 14 after 21m 49s: [ ] [ cplerr ] rev 15 after 6m 32s: [ seq ] [ cplerr ] rev 16 after 38s: [ seq ] [ run ], time= rev 17 after 57s: [ seq ] [ run ], time= rev 18 after 2m 25s: [ ] [ run ], time= rev 19 after 1m 14s: [ ] [ run ], time= rev 64 after 2m 22s: [ par ] [ runerr ] rev 65 after 5m 0s: [ par ] [ crash ] rev 66 after 6m 43s: [ ] [ run ], time= rev 67 after 2m 30s: [ ] [ run ], time= rev 68 after 55s: [ ] [ run ], time= rev 69 after 29s: [ ] [ run ], time= rev 70 after 46m 2s: [ ] [ run ], time=

Summary of student experience Student # 1: 324 runs in 22h 41m 37s best = Student # 2: 70 runs in 8h 35m 11s best = Student # 3: 36 runs in 4h 59m 54s best = Student # 4: 216 runs in 11h 42m 54s best = Student # 5: 173 runs in 14h 57m 35s best = Student # 6: 122 runs in 8h 27m 10s best = Student # 7: 174 runs in 18h 17m 8s best = Student # 8: 536 runs in 27h 10m 16s best = Student # 9: 72 runs in 17h 23m 32s best = Student #10: 110 runs in 11h 43m 29s best = Student #11: 325 runs in 41h 18m 12s best = Student #12: 188 runs in 18h 24m 39s best = 4.428

Compiles, Runs, Correct runs Distribution varies significantly by programmer

LOC profiles All kinds of workflows are observed in the class

Timed Markov processes A timed Markov process is a Markov process with associated state dwell times The dwell times depend on how the state is exited prob(B | A) is the probability the next state is B given that the current state is A time(A | B) is the dwell time spent in state A given that the next state is B  The dwell time is a random variable in general State BState A prob(B | A) / time(A | B) From Burton Smith and David Mizell, CRAY

express new scientific theory in informal notation express new theory in VHLL (Matlab, Perl, OS360 JCL, etc.) debug code on small data sets test new theory on small, medium data sets -- compare against known results, previous models redesign program for HPC system write program for HPC system compile for debug test HPC code debug HPC code select medium-to-large data set for performance testing/optimization optimize HPC code for performance do performance test run for HPC code select, obtain large-scale data set structure data set for large-scale computation test large-scale data set arrangement for correctness test for expected performance design/implement visualization approach for larger-scale problems run performance-tuned version against large-scale data set visualize results revise control parameters select new data set General model of researcher workflow Items in red: new system could shorten time – we’ll try to model these Items in orange: could also be sped up by new system, but not yet part of model From Burton Smith and David Mizell, CRAY

Researcher workflow model CompileDebug CompileOptimize Program Test Run Formulate 1/T f 1/T p pd/Ttpd/Tt 1/T d 1/T c qdpp/Ttqdpp/Tt qdqp/Ttqdqp/Tt 1/T o po/Trpo/Tr qo/Trqo/Tr From Burton Smith and David Mizell, CRAY

Fitting data to Cray model Average dwell times program compile1 test debug run optimize compile2 program 5s compile1 4m 28s 6m 20s test 49s debug 5s 5s run 9s 30s optimize 4s compile2 10m 29s done 5s 3s Transition probabilities program compile1 test debug run optimize compile2 program 23.7% compile % 100.0% test 100.0% debug 71.3% 26.6% run 4.8% 100.0% optimize 69.9% compile % done 0.2% 3.5%

Fitting data to Cray model CompileDebug CompileOptimize Program Test Run Formulate 1.0 / 268s.713 / 5s 1.0 / 380s 1.0 / 49s 1.0 / 30s.048 / 9s 1.0 / 629s.699 / 4s.035 / 3s.002 / 5s.266 / 5s.237 / 5s

Conclusions Remember all the caveats!  This is very preliminary, no conclusions about workflows yet Fitting measured data to hypothesized workflows looks promising We’re getting better at knowing what to measure and how  See Vic’s talk yesterday  Formulation time, programming vs debugging,... ?  Measure automatically when possible  Lots more data coming soon from classroom experiments  Should be able to do this with professional data too Want to compare different {languages / apps /... } Want to use principled approach to estimating statistics of dwell times, evaluating competing state models, etc.