VIPER Optimization What is optimization? How does viper’s Station and Time Period optimization work? How to interpret results? What to avoid? Tom Pagano.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Digital Circuits.
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
Lecture 10 Query Optimization II Automatic Database Design.
Based on slides by: Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. ECE/CS 352: Digital System Fundamentals Lecture 8 – Systematic Simplification.
Helper Variables Why do we want them? How do we create them? What to avoid with them? Tom Pagano
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Give qualifications of instructors: DAP
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
Time Series and Forecasting
A Brief Introduction to Statistical Forecasting Kevin Werner.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall7-1 Chapter 7: Forecasting.
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
by B. Zadrozny and C. Elkan
DEVELOPMENT OF QUALITY BY DESIGN (QBD) GUIDANCE ELEMENTS ON DESIGN SPACE SPECIFICATIONS ACROSS SCALES WITH STABILITY CONSIDERATIONS Fluid Bed Drying Small.
Department of Computer Engineering
1 Confounding In an unreplicated 2 K there are 2 K treatment combinations. Consider 3 factors at 2 levels each: 8 t.c’s If each requires 2 hours to run,
A small country has 4 provinces – A, B, C and D. Each province contains 30%, 20%, 10% and 40% of the population in the country, respectively. The.
Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression.
Where does VIPER get its data from? Tom Pagano
1 The General 2 k Factorial Design Section 6-4, pg. 224, Table 6-9, pg. 225 There will be k main effects, and.
Development and evaluation of Passive Microwave SWE retrieval equations for mountainous area Naoki Mizukami.
Optimization Algorithm
Lecture 9 Query Optimization.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
The Binomial Theorem Lecture 29 Section 6.7 Mon, Apr 3, 2006.
IE341 Midterm. 1. The effects of a 2 x 2 fixed effects factorial design are: A effect = 20 B effect = 10 AB effect = 16 = 35 (a) Write the fitted regression.
Examples. Examples (1/11)  Example #1: f(A,B,C,D) =  m(2,3,4,5,7,8,10,13,15) Fill in the 1’s. 1 1 C A B CD AB D 1 1.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
07 KM Page 1 ECEn/CS 224 Karnaugh Maps. 07 KM Page 2 ECEn/CS 224 What are Karnaugh Maps? A simpler way to handle most (but not all) jobs of manipulating.
Solutions. 1.The tensile strength of concrete produced by 4 mixer levels is being studied with 4 replications. The data are: Compute the MS due to mixers.
This document gives one example of how one might be able to “fix” a meteorological file, if one finds that there may be problems with the file. There are.
Specific Comparisons This is the same basic formula The only difference is that you are now performing comps on different IVs so it is important to keep.
NRCS National Water and Climate Center Update Tom Pagano Natural Resources Conservation Service.
The Z-Score Regression Method and You Tom Pagano
Hydrologic Forecasting With Statistical Models Angus Goodbody David Garen USDA Natural Resources Conservation Service National Water and Climate Center.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
The Viper Main Interface Layout and interpretation.
1 Prof. Indrajit Mukherjee, School of Management, IIT Bombay High (2 pounds ) Low (1pound) B=60 ( ) Ab=90 ( ) (1)=80 ( ) A=100.
Probabilistic seasonal water supply forecasting in an operational environment: the USDA-NRCS Perspective Tom Pagano
Analysis of Mixed Cost— Using Excel Regression Function for Professor Martin Taylor Presented by Wenxiang Lu.
1 Example: Groupings on 3-Variable K-Maps BC F(A,B,C) = A ’ B ’ A BC F(A,B,C) = B ’ A
 Seattle Pacific University EE Logic System DesignKMaps-1 Two-Level Simplification All Boolean expressions can be represented in two- level forms.
Long-term Trends in Water Supply Forecast Skill
The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.
Analysis of Blue Mesa Inflow Forecast Errors Tom Pagano, aka: “Wha’ happa’???”
Designs for Experiments with More Than One Factor When the experimenter is interested in the effect of multiple factors on a response a factorial design.
1 CS 352 Introduction to Logic Design Lecture 4 Ahmed Ezzat Multi-level Gate Circuits and Combinational Circuit Design Ch-7 + Ch-8.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
1 CS 352 Introduction to Logic Design Lecture 2 Ahmed Ezzat Boolean Algebra and Its Applications Ch-3 + Ch-4.
©2010 Cengage Learning SLIDES FOR CHAPTER 3 BOOLEAN ALGEBRA (continued) Click the mouse to move to the next page. Use the ESC key to exit this chapter.
Lecture 34 Section 6.7 Wed, Mar 28, 2007
Lecture 3 Algebraic Simplification
Frequent Pattern Mining
Optimized Implementation of Logic Function
First Law of Thermodynamics
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Optimized Implementation of Logic Function
Data Mining Association Analysis: Basic Concepts and Algorithms
Optimization Algorithm
Design and Analysis of Multi-Factored Experiments
CS 581 Tandy Warnow.
Andy Wood and Dennis P. Lettenmaier
Fractional Factorial Design
Optimized Implementation of Logic Function
Design matrix Run A B C D E
Graziano and Raulin Research Methods: Chapter 12
Presentation transcript:

VIPER Optimization What is optimization? How does viper’s Station and Time Period optimization work? How to interpret results? What to avoid? Tom Pagano

As used in a statistical forecasting context Target Tomichi at Gunnison Apr-July streamflow

As used in a statistical forecasting context Target Tomichi at Gunnison Apr-July streamflow Data types (3) Snowpack Precipitation Temperature Predictor Stations (7) Porphyry Creek Slumgullion Cochetopa Pass Monarch Offshoot Butte Taylor Park St Elmo Months (4) October November December January What is the optimal combination of variables to get the best prediction?

As used in a statistical forecasting context Target Tomichi at Gunnison Apr-July streamflow Data types (3) Snowpack Precipitation Temperature Predictor Stations (7) Porphyry Creek Slumgullion Cochetopa Pass Monarch Offshoot Butte Taylor Park St Elmo Months (4) October November December January What is the optimal combination of variables to get the best prediction?

As used in a statistical forecasting context Target Tomichi at Gunnison Apr-July streamflow Data types (3) Snowpack Precipitation Temperature Predictor Stations (7) Porphyry Creek Slumgullion Cochetopa Pass Monarch Offshoot Butte Taylor Park St Elmo Months (4) October November December January What is the optimal combination of variables to get the best prediction?

As used in a statistical forecasting context Target Tomichi at Gunnison Apr-July streamflow Data types (3) Snowpack Precipitation Temperature Predictor Stations (7) Porphyry Creek Slumgullion Cochetopa Pass Monarch Offshoot Butte Taylor Park St Elmo Months (4) October November December January What is the optimal combination of variables to get the best prediction? Too many combination to evaluate using “brute force” (at least 24,000)

As used in a statistical forecasting context Target Tomichi at Gunnison Apr-July streamflow Data types (3) Snowpack Precipitation Temperature Predictor Stations (7) Porphyry Creek Slumgullion Cochetopa Pass Monarch Offshoot Butte Taylor Park St Elmo Months (4) October November December January What is the optimal combination of variables to get the best prediction? Too many combination to evaluate using “brute force” (at least 24,000) Instead we could “search”, trying some combinations and following up promising avenues.

Station optimization search algorithm, simple example Target Tomichi at Gunnison Apr-July streamflow Predictors A Avalanche Oct-Mar Precip B Butte Apr 1 SWE C Cochetopa Pass Apr 1 SWE D Datother Station Apr 1 SWE

Station optimization search algorithm, simple example Build all 1 variable combinations 1. A 2. B 3. C 4. D

Station optimization search algorithm, simple example Build all 1 variable combinations 1. A 2. B 3. C 4. D Evaluate regression vs flow, find jackknife standard error 1. A – B – C – D – 30

Station optimization search algorithm, simple example Build all 1 variable combinations 1. A 2. B 3. C 4. D Evaluate regression vs flow, find jackknife standard error 1. A – B – C – D – 30 Sort equations, best to worst 1. D – 30 best 2. C – B – A – 38 worst

Station optimization search algorithm, simple example Build all 1 variable combinations 1. A 2. B 3. C 4. D Evaluate regression vs flow, find jackknife standard error 1. A – B – C – D – 30 Sort equations, best to worst 1. D – 30 best 2. C – B – A – 38 worst Build 2 variable equations, regress, find jackknife standard error 1. D – C – B – A – 38 A,D - 32 B,D - 34 C,D - 29 A,C - 37 B,C - 35 A,B - 39 Note: A,B is the same as B,A

Station optimization search algorithm, simple example Re-sort results… “prune” branches, keeping top 5 1. C,D D A,D C B,D - 34

Station optimization search algorithm, simple example Re-sort results… “prune” branches, keeping top 5 1. C,D D A,D C B,D - 34 Build all 3-variable combinations, growing from top 5 list, evaluate 1. C,D - 29 A,C,D - 28 B,C,D D A,D - 32 A,B,D C B,D - 34 Note: A,B,C not evaluated… No “trunk” leads us there

Station optimization search algorithm, simple example Re-sort results… “prune” branches, keeping top 5 1. C,D D A,D C B,D - 34 Build all 3-variable combinations, growing from top 5 list, evaluate 1. C,D - 29 A,C,D - 28 B,C,D D A,D - 32 A,B,D C B,D - 34 Re-sort list, evaluate 4-variable combinations 1. A,C,D – 28 A,B,C,D C,D D A,D C - 33 Note: A,B,C not evaluated No “trunk” leads us there

Station optimization search algorithm, simple example Re-sort list… Optimization finished when top 5 list doesn’t change 1. A,C,D C,D D A,B,C,D A,D - 32

Station optimization search algorithm, simple example Re-sort list… Optimization finished when top 5 list doesn’t change 1. A,C,D C,D D A,B,C,D A,D - 32 Questions to ask: Which variables are “popular”? What would maintain consistency from month to month? Does the combination make physical sense? Is some data harder to get than others? Optimization is not a substitute for thinking!

Viper station optimization

Output

Input data: Year Y1 X

DETAILED OUTPUT RANKED REGRESSION EQUATIONS: ***** RANK 1 coef var X9 701,OCT-MAY,PRCP,SNTL,CO,AWDB,Porphyry Creek Constant #obs= 22 #pc = 1 Jr = JStdErr = JStdErrSS = r = StdErr = StdErrSS = JCK REG JCK REG STD REG STD REG YEAR OBSERVED COMPUTED ERROR COMPUTED ERROR

SUMMARY OUTPUT Name of this file: _optimization.txt Number of combinations evaluated = 1 Created on 6/13/2007 3:36:56 PM by tpagano Transformation type: Cubert Analysis type: Principal Components VARIABLES: Y ,APR-JUL,SRVO,USGS,CO,AWDB,Tomichi Creek At Gunnison, Co X9 701,OCT-MAY,PRCP,SNTL,CO,AWDB,Porphyry Creek EQUATION SUMMARY: RANK VARIABLES JACKKNIFE JACK. NUM. 1 STANDARD CORR. OBS. NUM ERROR COEF. USED PC'S X

Y ,APR-JUL,SRVO,USGS,CO,AWDB,Tomichi Creek At Gunnison, Co X1 06L03,MAY,WTEQ,SNOW,CO,AWDB,Porphyry Creek X2 762,MAY,WTEQ,SNTL,CO,AWDB,Slumgullion X3 06L06,MAY,WTEQ,SNOW,CO,AWDB,Cochetopa Pass X4 06L09,MAY,WTEQ,SNOW,CO,AWDB,Monarch Offshoot X5 06L04,MAY,WTEQ,SNOW,CO,AWDB,Monarch Pass X6 701,MAY,WTEQ,SNTL,CO,AWDB,Porphyry Creek X7 701,OCT-MAY,PRCP,SNTL,CO,AWDB,Porphyry Creek X8 762,OCT-MAY,PRCP,SNTL,CO,AWDB,Slumgullion X12 380,OCT-MAY,PRCP,SNTL,CO,AWDB,Butte EQUATION SUMMARY: RANK VARIABLES JACKKNIFE JACK. NUM STANDARD CORR. OBS. NUM ERROR COEF. USED PC'S 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Interpreting variable tables

Y ,APR-JUL,SRVO,USGS,CO,AWDB,Tomichi Creek At Gunnison, Co X1 06L03,MAY,WTEQ,SNOW,CO,AWDB,Porphyry Creek X2 762,MAY,WTEQ,SNTL,CO,AWDB,Slumgullion X3 06L06,MAY,WTEQ,SNOW,CO,AWDB,Cochetopa Pass X4 06L09,MAY,WTEQ,SNOW,CO,AWDB,Monarch Offshoot X5 06L04,MAY,WTEQ,SNOW,CO,AWDB,Monarch Pass X6 701,MAY,WTEQ,SNTL,CO,AWDB,Porphyry Creek X7 701,OCT-MAY,PRCP,SNTL,CO,AWDB,Porphyry Creek X8 762,OCT-MAY,PRCP,SNTL,CO,AWDB,Slumgullion X12 380,OCT-MAY,PRCP,SNTL,CO,AWDB,Butte EQUATION SUMMARY: RANK VARIABLES JACKKNIFE JACK. NUM STANDARD CORR. OBS. NUM ERROR COEF. USED PC'S 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Interpreting variable tables Recognize popular variables

Y ,APR-JUL,SRVO,USGS,CO,AWDB,Tomichi Creek At Gunnison, Co X1 06L03,MAY,WTEQ,SNOW,CO,AWDB,Porphyry Creek X2 762,MAY,WTEQ,SNTL,CO,AWDB,Slumgullion X3 06L06,MAY,WTEQ,SNOW,CO,AWDB,Cochetopa Pass X4 06L09,MAY,WTEQ,SNOW,CO,AWDB,Monarch Offshoot X5 06L04,MAY,WTEQ,SNOW,CO,AWDB,Monarch Pass X6 701,MAY,WTEQ,SNTL,CO,AWDB,Porphyry Creek X7 701,OCT-MAY,PRCP,SNTL,CO,AWDB,Porphyry Creek X8 762,OCT-MAY,PRCP,SNTL,CO,AWDB,Slumgullion X12 380,OCT-MAY,PRCP,SNTL,CO,AWDB,Butte EQUATION SUMMARY: RANK VARIABLES JACKKNIFE JACK. NUM STANDARD CORR. OBS. NUM ERROR COEF. USED PC'S 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Interpreting variable tables Recognize popular variables Be aware of num. obs

Y ,APR-JUL,SRVO,USGS,CO,AWDB,Tomichi Creek At Gunnison, Co X1 06L03,MAY,WTEQ,SNOW,CO,AWDB,Porphyry Creek X2 762,MAY,WTEQ,SNTL,CO,AWDB,Slumgullion X3 06L06,MAY,WTEQ,SNOW,CO,AWDB,Cochetopa Pass X4 06L09,MAY,WTEQ,SNOW,CO,AWDB,Monarch Offshoot X5 06L04,MAY,WTEQ,SNOW,CO,AWDB,Monarch Pass X6 701,MAY,WTEQ,SNTL,CO,AWDB,Porphyry Creek X7 701,OCT-MAY,PRCP,SNTL,CO,AWDB,Porphyry Creek X8 762,OCT-MAY,PRCP,SNTL,CO,AWDB,Slumgullion X12 380,OCT-MAY,PRCP,SNTL,CO,AWDB,Butte EQUATION SUMMARY: RANK VARIABLES JACKKNIFE JACK. NUM STANDARD CORR. OBS. NUM ERROR COEF. USED PC'S 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Interpreting variable tables Recognize popular variables Be aware of num. obs Meaningful interpretations

Viper time period optimization Maybe you know “slumgullion precip” should be a predictor, but don’t know if it should be oct-mar or nov-mar or ? Time optimization varies stations with in groups one at a time, looking for the optimal combination.

Viper time period optimization Maybe you know “slumgullion precip” should be a predictor, but don’t know if it should be oct-mar or nov-mar or ? Time optimization varies stations with in groups one at a time, looking for the optimal combination. It does not have a fancy search algorithm, it tries all combinations. Time optimization available for Z-score or PCA… Station optimization only available in PCA for now.

Interpretation 1.Adjust only SnotelPRCP and NRCSStrm. Leave others alone.

Interpretation 1.Adjust only SnotelPRCP and NRCSStrm. Leave others alone. 2.Evaluate all combinations for SnotelPrcp between Oct and Mar that end with March (e.g. Oct-Mar, Nov-Mar, Dec-Mar, Jan-Mar, Feb-Mar, Mar-Mar)

Interpretation 1.Adjust only SnotelPRCP and NRCSStrm. Leave others alone. 2.Evaluate all combinations for SnotelPrcp between Oct and Mar that end with March (e.g. Oct-Mar, Nov-Mar, Dec-Mar, Jan-Mar, Feb-Mar, Mar-Mar) 3. Evaluate all combinations for NRCSStrm between Jul and Mar (e.g. Aug-Feb is a valid combo Jun-Mar is not)

Interpretation 1.Adjust only SnotelPRCP and NRCSStrm. Leave others alone. 2.Evaluate all combinations for SnotelPrcp between Oct and Mar that end with March (e.g. Oct-Mar, Nov-Mar, Dec-Mar, Jan-Mar, Feb-Mar, Mar-Mar) 3. Evaluate all combinations for NRCSStrm between Jul and Mar (e.g. Aug-Feb is a valid combo Jun-Mar is not) 4. Return best combination to interface

Recent additions Don’t allow the “best” month for a group be one that eliminates that group. e.g. Oct-Jan precipitation + August swe = “best” But there is no august SWE. Really it just wants precip alone. Optimize Groups in Order Optimize Groups Dependently Optimize Groups Independently When optimizing one group do you leave the others on or off? Do you want each group to try to do its best on its own, or let groups cooperate?

A word of caution about optimization! A station is not graded on the exams that it skips Streamflow Forecast Observed Station availability #1 #2 High error period

A word of caution about optimization! A station is not graded on the exams that it skips Period of record Standard error # # Streamflow Forecast Observed Station availability #1 #2 High error period Which station is really “best”? better

A word of caution about optimization! A station is not graded on the exams that it skips Period of record Standard error # # Standard error for overlapping period ( ) # # Streamflow Forecast Observed Station availability #1 #2 High error period Which station is really “best”? better