Fast reverse-mode automatic differentiation using expression templates in C++ Robin Hogan University of Reading.

Slides:



Advertisements
Similar presentations
Fast lidar & radar multiple-scattering models for cloud retrievals Robin Hogan (University of Reading) Alessandro Battaglia (University of Bonn) How can.
Advertisements

Robin Hogan, Julien Delanoe and Nicola Pounder University of Reading Towards unified retrievals of clouds, precipitation and aerosols.
Synergistic cloud retrievals from radar, lidar and radiometers
Radar/lidar/radiometer retrievals of ice clouds from the A-train
Lidar observations of mixed-phase clouds Robin Hogan, Anthony Illingworth, Ewan OConnor & Mukunda Dev Behera University of Reading UK Overview Enhanced.
Robin Hogan, Nicola Pounder, Chris Westbrook University of Reading, UK
Robin Hogan, Chris Westbrook University of Reading, UK Alessandro Battaglia University of Leicester, UK Fast forward modelling of radar and lidar depolarization.
Proposed new uses for the Ceilometer Network
Robin Hogan Julien Delanoë Nicola Pounder University of Reading Synergistic cloud, aerosol and precipitation products Progress so far in RATEC.
Robin Hogan, Nicola Pounder University of Reading, UK
Radar/lidar observations of boundary layer clouds
Robin Hogan, Julien Delanoë, Nicky Chalmers, Thorwald Stein, Anthony Illingworth University of Reading Evaluating and improving the representation of clouds.
Robin Hogan & Julien Delanoe
Joint ECMWF-University meeting on interpreting data from spaceborne radar and lidar: AGENDA 09:30 Introduction University of Reading activities 09:35 Robin.
Robin Hogan Julien Delanoë Nicola Pounder Chris Westbrook
Modelling radar and lidar multiple scattering Robin Hogan
Towards “unified” retrievals of cloud, precipitation and aerosol from combined radar, lidar and radiometer observations Robin Hogan, Julien Delanoë, Nicola.
Robin Hogan Department of Meteorology University of Reading Cloud and Climate Studies using the Chilbolton Observatory.
Robin Hogan, Richard Allan, Nicky Chalmers, Thorwald Stein, Julien Delanoë University of Reading How accurate are the radiative properties of ice clouds.
Robin Hogan Julien Delanoe Department of Meteorology, University of Reading, UK Towards unified radar/lidar/radiometer retrievals for cloud radiation studies.
Robin Hogan Julien Delanoe University of Reading Remote sensing of ice clouds from space.
Variational cloud retrievals from radar, lidar and radiometers
What can we learn about clouds and their representation in models from the synergy of radar and lidar observations? Robin Hogan, Julien Delanoë, Nicky.
Modelling radar and lidar multiple scattering Modelling radar and lidar multiple scattering Robin Hogan The CloudSat radar and the Calipso lidar were launched.
Yi Heng Second Order Differentiation Bommerholz – Summer School 2006.
The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Assimilation Algorithms: Tangent Linear and Adjoint models Yannick Trémolet ECMWF Data Assimilation Training Course March 2006.
Exploiting multiple scattering in CALIPSO measurements to retrieve liquid cloud properties Nicola Pounder, Robin Hogan, Lee Hawkness-Smith, Andrew Barrett.
MATH 685/ CSI 700/ OR 682 Lecture Notes
EarthCARE: The next step forward in global measurements of clouds, aerosols, precipitation & radiation Robin Hogan ECMWF & University of Reading With input.
ESA Explorer mission EarthCARE: Earth Clouds, Aerosols and Radiation Explorer Joint ESA/JAXA mission Launch 2016 Budget 700 MEuro.
CPSC Compiler Tutorial 9 Review of Compiler.
Linear Algebraic Equations
ECMWF CO 2 Data Assimilation at ECMWF Richard Engelen European Centre for Medium-Range Weather Forecasts Reading, United Kingdom Many thanks to Phil Watts,
Page 1 1 of 20, EGU General Assembly, Apr 21, 2009 Vijay Natraj (Caltech), Hartmut Bösch (University of Leicester), Rob Spurr (RT Solutions), Yuk Yung.
Normalised Least Mean-Square Adaptive Filtering

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
VARSY Final Presentation ATLID-CPR-MSI Clouds, Aerosols and Precipitation “Best Estimate” Robin Hogan, Nicola Pounder, Brian Tse, Chris Westbrook University.
Robin Hogan Department of Meteorology School of Mathematical and Physical Sciences University of Reading Can operator-overloading ever have a speed approaching.
EarthCARE and snow Robin Hogan University of Reading.
Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.
Finite Element Method.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Research Vignette: The TransCom3 Time-Dependent Global CO 2 Flux Inversion … and More David F. Baker NCAR 12 July 2007 David F. Baker NCAR 12 July 2007.
1 Optimal Channel Selection. 2 Redundancy “Information Content” vs. “On the diagnosis of the strength of the measurements in an observing system through.
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
Texas A&M University, Department of Aerospace Engineering AN EMBEDDED FUNCTION TOOL FOR MODELING AND SIMULATING ESTIMATION PROBLEMS IN AEROSPACE ENGINEERING.
Use of Solar Reflectance Hyperspectral Data for Cloud Base Retrieval Andrew Heidinger, NOAA/NESDIS/ORA Washington D.C, USA Outline " Physical basis for.
Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.
Cloud and precipitation best estimate… …and things I don’t know that I want to know Robin Hogan University of Reading.
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Challenges and Strategies for Combined Active/Passive Precipitation Retrievals S. Joseph Munchak 1, W. S. Olson 1,2, M. Grecu 1,3 1: NASA Goddard Space.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 Radiative Transfer Models and their Adjoints Paul van Delst.
12 April 2013 VARSY progress meeting Robin Hogan and Nicola Pounder (University of Reading)
MATLAB (Matrix Algebra laboratory), distributed by The MathWorks, is a technical computing environment for high performance numeric computation and.
Slide 1 Robin Hogan, APRIL-CLARA-DORSY meeting 2016 ©ECMWF Towards a fast shortwave radiance forward model for exploiting MSI measurements Robin Hogan.
6.5.4 Back-Propagation Computation in Fully-Connected MLP.
Deep Feedforward Networks
Adjoint modeling and applications
EarthCARE and snow Robin Hogan, Chris Westbrook University of Reading
Requirements Consolidation of the Near-Infrared Channel of the GMES-Sentinel-5 UVNS Instrument: FP, 25 April 2014, ESTEC Height-resolved aerosol R.Siddans.
EE513 Audio Signals and Systems
Presentation transcript:

Fast reverse-mode automatic differentiation using expression templates in C++ Robin Hogan University of Reading

Overview Spaceborne radar and lidar Adjoint coding Automatic differentiation New approach Testing with lidar multiple-scattering forward models

Spaceborne radar, lidar and radiometers EarthCare The A-Train NASA 700-km orbit CloudSat 94-GHz radar (launch 2006) Calipso 532/1064-nm depol. lidar MODIS multi-wavelength radiometer CERES broad-band radiometer AMSR-E microwave radiometer EarthCARE: launch 2015(?) ESA+JAXA 400-km orbit: more sensitive 94-GHz Doppler radar 355-nm HSRL/depol. lidar Multispectral imager Broad-band radiometer Heart-warming name

What do CloudSat and Calipso see? Cloudsat radar Radar: ~D6, detects whole profile, surface echo provides integral constraint Lidar: ~D2, more sensitive to thin cirrus and liquid but attenuated Radar-lidar ratio provides size D CALIPSO lidar Target classification Insects Aerosol Rain Supercooled liquid cloud Warm liquid cloud Ice and supercooled liquid Ice Clear No ice/rain but possibly liquid Ground Delanoe and Hogan (2008, 2010)

Ingredients developed Implement previous work Not yet developed Unified retrieval 1. New ray of data: define state vector Use classification to specify variables describing each species at each gate Ice: extinction coefficient, N0’, lidar extinction-to-backscatter ratio Liquid: extinction coefficient and number concentration Rain: rain rate, drop diameter and melting ice Aerosol: extinction coefficient, particle size and lidar ratio 3a. Radar model Including surface return and multiple scattering 3b. Lidar model Including HSRL channels and multiple scattering 3c. Radiance model Solar and IR channels 4. Compare to observations Check for convergence 6. Iteration method Derive a new state vector Adjoint of full forward model Quasi-Newton scheme 3. Forward model Not converged Converged Proceed to next ray of data 2. Convert state vector to radar-lidar resolution Often the state vector will contain a low resolution description of the profile 7. Calculate retrieval error Error covariances and averaging kernel Ingredients developed Implement previous work Not yet developed

Unified retrieval: Forward model From state vector x to forward modelled observations H(x)... Ice & snow Liquid cloud Rain Aerosol x Adjoint of radar model (vector) Adjoint of lidar model (vector) Adjoint of radiometer model Gradient of cost function (vector) xJ=HTR-1[y–H(x)] Vector-matrix multiplications: around the same cost as the original forward operations Adjoint of radiative transfer models yJ=R-1[y–H(x)] Ice/radar Liquid/radar Rain/radar Ice/lidar Liquid/lidar Rain/lidar Aerosol/lidar Ice/radiometer Liquid/radiometer Rain/radiometer Aerosol/radiometer Lookup tables to obtain profiles of extinction, scattering & backscatter coefficients, asymmetry factor Radar scattering profile Lidar scattering profile Radiometer scattering profile Sum the contributions from each constituent Radar forward modelled obs Lidar forward modelled obs Radiometer fwd modelled obs H(x) Radiative transfer models

Radiative transfer models Observation Model Speed Status Radar reflectivity factor Multiscatter: single scattering option N OK Radar reflectivity factor in deep convection Multiscatter: single scattering plus TDTS MS model (Hogan and Battaglia 2008) N2 Radar Doppler velocity Single scattering OK if no NUBF; fast MS model with Doppler does not exist Not available for MS HSRL lidar in ice and aerosol Multiscatter: PVC model (Hogan 2008) HSRL lidar in liquid cloud Multiscatter: PVC plus TDTS models Lidar depolarization Multiscatter: under development In progress Infrared radiances Delanoe and Hogan (2008) two-stream source function method No adjoint RTTOV (EUMETSAT license) Disappointing accuracy for clouds Solar radiances LIDORT (permissive license) Testing After much pain have hand-coded adjoint for multiscatter model (in C) but still need adjoint for all the rest of the algorithm (in C++)

Adjoint and Jacobian coding Variational retrieval methods are posed as: “find the vector x that minimises the cost function J(x)” Two common minimization methods: The quasi-Newton method requires the “adjoint code” to compute the gradient ∂J/∂x for any x The Gauss-Newton method writes the observational part of the cost function as the sum of the squared deviation of the observations from their forward modelled counterparts y, and requires a code to compute the Jacobian matrix H = ∂y/∂x Since J(x) is complicated (containing all of our radiative transfer models), the code to generate ∂J/∂x or ∂y/∂x is even more complicated Can it be generated automatically?

Approaches to adjoint coding Do it by hand (e.g. ECMWF) Painful and time consuming to debug Generates the most efficient code Do it numerically: perturb each element of x one by one Inefficient and infeasible for large x Subject to round-off error What I’m using at the moment with Unified Algorithm Automatic differentiation 1: Use a source-to-source compiler E.g. TAPENADE/TAF/TAC++ generate adjoint source file from algorithm file: generates quite efficient code Comercial: 5k/year for TAF/TAC++ academic license and need permission to distribute generated source code TAPENADE requires to upload file to server Limited support for C++ classes and no support for C++ templates Automatic differentiation 2: Use an operator overloading technique E.g. CppAD, ADOL-C, in principle can work with any language features Typically 25 times slower than hand-coded adjoint! Can we do better?

Simple example Consider simple algorithm y(x0, x1) contrived for didactic purposes: Implemented in C or Fortran90 as: Task: given ∂J/∂y, we want to compute ∂J/∂x0 and ∂J/∂x1 function algorithm(x) result(y) implicit none real, intent(in) :: x(2) real :: y real :: s y = 4.0 s = 2.0*x(1) + 3.0*x(2)*x(2) y = y * sin(s) return endfunction double algorithm(const double x[2]) { double y = 4.0; double s = 2.0*x[0] + 3.0*x[1]*x[1]; y *= sin(s); return y; }

Creating the adjoint code 1 Differentiate the algorithm: Write each statement in matrix form: Transpose the matrix to get equivalent adjoint statement: Consider dy as the derivative of y with respect to something     Consider d*y as dJ/dy

Creating the adjoint code 2 Apply adjoint statements in reverse order: Reverse mode: Forward mode: double algorithm_AD(const double x[2], double y_AD[1], double x_AD[2]) { double y = 4.0; double s = 2.0*x[0] + 3.0*x[1]*x[1]; y *= sin(s); /* Adjoint part: */ double s_AD = 0.0; y_AD[0] += sin(s) * y_AD[0]; s_AD += y * cos(s) * y_AD[0]; x_AD[0] += 3.0 * s_AD; x_AD[1] += 6.0 * x[0] * s_AD; s_AD = 0.0; y_AD[0] = 0.0; return y; } Note: need to store intermediate values for the reverse pass Hand-coding is time-consuming and error prone for large codes

Automatic differentiation We want something like this (now in C++): Operators (e.g. +–*/) and functions (e.g. sin, exp, log) applying to adouble objects are overloaded not only to return the result of the operation, but also to store the gradient information in stack Libraries CppAD, SACADO and ADOL-C do this but the result is around 25 times slower than hand-coded adjoints… why? adouble algorithm(const adouble x[2]) { adouble y = 4.0; adouble s = 2.0*x[0] + 3.0*x[1]*x[1]; y *= sin(s); return y; } // Main code Stack stack; // Object where info will be stored adouble x[2] = {…, …} // Set algorithm inputs adouble y = algorithm(x); // Run algorithm and store info in stack y.set_gradient(y_AD); // Set dJ/dy stack.reverse(); // Run adjoint code from stored info x_AD[0] = x[0].get_gradient(); // Save resulting values of dJ/dx0 x_AD[1] = x[1].get_gradient(); // ... and dJ/dx1 Simple change: label “active” variables as a new type

Minimum necessary storage What is the minimum necessary storage to store these statements? If we label each gradient by an integer (since they’re unknown in forward pass) then we need two stacks that can be added to as the algorithm progresses: Can then run backwards through stack to compute adjoints Statement stack Operation stack Index to LHS gradient Index to first operation 2 (dy) 3 (ds) 2 … # Multiplier Index to RHS gradient 2.0 0 (dx0) 1 6.0x1 1 (dx1) 2 sin(s) 2 (dy) 3 y cos(s) 3 (ds) 4 …

Adjoint algorithm is simple Need to cope with three different types of differential statement: Reverse mode: Forward mode: Equivalent adjoint statements: General differential statement: for i = 0 to n:

…which can be coded as follows 1. Loop over derivative statements in reverse order 2. Save gradient 3. Skip if gradient equals 0 (big optimization) 4. Loop over operations 5. Update an adjoint This does the right thing in our three cases: Zero on RHS One or more gradients on RHS Same gradient on LHS and RHS

“Dual numbers” approach How can these stacks be created? Consider what happens when compiler sees this line: Compiler splits this up into two parts with temporary t: We could define adouble as “dual number” [x, dx] (invented by Clifford 1873) and then overload sin and operator*: [sin(s), cos(s)*ds] = sin([s, ds]) [y*t, t*dy+y*dt] = [y, dy] * [t, dt] This would correctly apply but only if the gradient terms on the right-hand-side are known! This is not useful for the reverse-mode (adjoint) when we want to store a symbolic representation of the gradient on the forward sweep which is then filled on the reverse sweep Dual numbers are used in some forward-mode-only (tangent linear) automatic differentiation tools. y = y * sin(s) adouble t = sin(s) y = operator*(y, t)

So how do CppAD & ADOL-C work? In the forward pass they store the whole algorithm symbolically, not just the derivative form! This means every operator and function needs to be stored symbolically (e.g. 0 for plus, 1 for minus, 42 for atan etc) The stored algorithm can then be analysed to generate an adjoint function This all happens behind the scenes so easy to use, but not surprising that it is 25 times slower than a hand-coded adjoint

Computational graphs operator* y sin s The basic problem is that standard operator overloading can only pass information from the most nested operation outwards Pass y sin(s) to be new y operator* Pass value of sin(s) y sin s

Implementing the chain rule     Differentiate multiply operator   Differentiate sine function

Computational graph 2 operator* sin y s Clearly differentiation most naturally involves passing information in the opposite sense Each node representing arbitrary function or operator y(a) needs to be able to take a real number w and pass wdy/da down the chain Binary function or operator y(a,b) would pass wdy/da to one argument and wdy/db to other At the end of the chain, store the result on the stack But how do we implement this? operator* sin y s Pass y Pass y cos(s) Pass sin(s) Add sin(s)dy to stack Add y cos(s)ds to stack

What is a template? Templates are a key ingredient to generic programming in C++ Imagine we have a function like this: We want it to work with any numerical type (single precision, complex numbers etc) but don’t want to laboriously define a new overloaded function for each possible type Can use a function template: double cube(const double x) { double y = x*x*x; return y; } template <typename Type> Type cube(Type x) { Type y = x*x*x; return y; } double a = 1.0; b = cube(a); // compiler creates function cube<double> complex<double> c(1.0, 2.0); // c = 1 + 2i d = cube(c); // compiler creates function cube<complex<double> >

What is an expression template? C++ also supports class templates Veldhuizen (1995) used this feature to introduce the idea of Expression Templates to optimize array operations and make C++ as fast as Fortran-90 for array-wise operations We use it as a way to pass information in both directions through the expression tree: sin(A) for an argument of arbitrary type A is overloaded to return an object of type Sin<A> operator*(A,B) for arguments of arbitrary type A and B is overloaded to return an object of type Multiply<A,B> Now when we compile the statement “y=y*sin(x)”: The right-hand-side resolves to an object “RHS” of type Multiply<adouble,Sin<adouble> > The overloaded assignment operator first calls RHS.value() to get y It then calls RHS.calc_gradient(), to add entries to operation stack Multiply and Sin are defined with member functions so that they can correctly pass information up and down the expression tree

Multiply<adouble,Sin<adouble> > New approach operator* sin y s Pass y Pass y cos(s) Pass sin(s) Add sin(s)dy to stack Add y cos(s)ds to stack Each function and operator y(a) implements a function calc_gradient that takes a real number w and passes wdy/da down the chain: The following types are passed up the chain at compile time: Multiply<adouble,Sin<adouble> > operator* adouble Sin<adouble> y sin adouble s

Implementation of Sin<A> // Definition of Sin class template <class A> class Sin : public Expression<Sin<A> > { public: // Member functions // Constructor: store reference to a and its numerical value Sin(const Expression<A>& a) : a_(a), a_value_(a.value()) { } // Return the value double value() const { return sin(a_value_); } // Compute derivative and pass to a void calc_gradient(Stack& stack, double multiplier) const { a_.calc_gradient(stack, cos(a_value_)*multiplier); } private: // Data members const A& a_; // A reference to the object double a_value_; // The numerical value of object }; // Overload the sin function: it returns a Sin<A> object inline Sin<A> sin(const Expression<A>& a) { return Sin<A>(a); } …Adept library has done this for all operators and functions

Optimizations Why are expression templates fast? Compound types representing complex expressions are known at compile time C++ automatically inlines function calls between objects in an expression, leaving little more than the operations you would put in a hand-coded application of the chain rule Further optimizations: Stack object keeps memory allocated between calls to avoid time spent allocating incrementally more memory If the Jacobian is computed it is done in strips to exploit vectorization (SSE/SSE2 on Intel) and loop unrolling The current stack is accessed by a global but thread-local variable, rather than storing a link to the stack in every adouble object (as in CppAD and ADOL-C)

Testing using lidar multiple scattering models Photon Variance-Covariance method for small-angle multiple scattering Hogan (JAS 2008) Somewhat similar to a monochromatic radiance model Four coupled ODEs are integrated forward in space Several variables at N gates give N output signals Computational cost proportional to N Time-dependent two-stream method for wide-angle multiple scattering Hogan and Battaglia (JAS 2008) Similar to a time-dependent 1D advection model Four coupled PDEs are integrated forward in time Several variables at N gates gives N output signals Computational cost proportional to N 2

Simulation of 3D photon transport Animation of scalar flux (I++I–) Colour scale is logarithmic Represents 5 orders of magnitude Domain properties: 500-m thick 2-km wide Optical depth of 20 No absorption In this simulation the lateral distribution is Gaussian at each height and each time

Benchmark results Only 5-20% slower than hand-coded adjoint Time relative to original code, gcc-4.4, Pentium 2.5 GHz, 2 MB cache Only 5-20% slower than hand-coded adjoint Adjoint PVC N=50 TDTS N=50 Hand-coded adjoint 3.0 (1.0+2.0) 3.6 (1.0+2.6) New C++ library: Adept 3.5 (2.7+0.8) 3.8 (2.6+1.2) ADOL-C 25 (18+7) 20 (15+5) CppAD 29 (15+7+7) 34 (17+8+9) 5-9 times faster than leading libraries providing same functionality Full Jacobian (50x350) PVC N=50 TDTS N=50 New C++ library: Adept 20 ADOL-C 83 69 CppAD 352 470 4-20 times faster for 50x350 Jacobian

Outlook New library Adept (Automatic Differentiation using Expression Templates) produces adjoint with minimum difficulty for user No knowledge of templates required by user at all Simple and efficient to compute Jacobian matrix as well Freely available at http://www.met.reading.ac.uk/clouds/adept/ Typically 5-20% slower than hand-coded adjoints But immeasurably faster in terms of programmer time Code is complete for applying to any C code with real numbers Further development desirable: Complex numbers Use within C++ matrix/vector libraries, particularly those that already use Expression Templates (like the one I use for the Unified Algorithm) Easily facilitate checkpointing so large codes don’t exhaust memory Automatically compute higher-order derivatives (e.g. Hessian matrix) Potential for student projects to get small data assimilation systems up and running and efficient quickly Impossible to apply in Fortran: no template capability!

Minimizing the cost function Gradient of cost function (a vector) Gauss-Newton method Rapid convergence (instant for linear problems) Get solution error covariance “for free” at the end Levenberg-Marquardt is a small modification to ensure convergence Need the Jacobian matrix H of every forward model: can be expensive for larger problems as forward model may need to be rerun with each element of the state vector perturbed and 2nd derivative (the Hessian matrix): Gradient Descent methods Fast adjoint method to calculate xJ means don’t need to calculate Jacobian Disadvantage: more iterations needed since we don’t know curvature of J(x) Quasi-Newton method to get the search direction (e.g. L-BFGS used by ECMWF): builds up an approximate inverse Hessian A for improved convergence Scales well for large x Poorer estimate of the error at the end

Time-dependent 2-stream approx. Describe diffuse flux in terms of outgoing stream I+ and incoming stream I–, and numerically integrate the following coupled PDEs: These can be discretized quite simply in time and space (no implicit methods or matrix inversion required) Time derivative Remove this and we have the time-independent two-stream approximation Source Scattering from the quasi-direct beam into each of the streams Gain by scattering Radiation scattered from the other stream Loss by absorption or scattering Some of lost radiation will enter the other stream Spatial derivative Transport of radiation from upstream Hogan and Battaglia (2008, J. Atmos. Sci.)