NSF Medium ITR Real-Time Mining of Integrated Weather Information Setup meeting (Aug. 30, 2002)
Goals Develop dynamic data mining applications (wherein information is extracted and provided to forecasters in real-time). Develop applications of radar data to identify severe weather signatures in a probabilistic manner. Build a prototype system so that these applications can be developed and tested on real- time and on archived data sets.
Tasks Projects Dual-polarization algorithms Clustering and Prediction Vortex Identification Areas of IT research SVMs (identification & prediction), multivariate feature identification techniques, probabilistic feature extraction, high performance issues I will talk about the tasks: we can decide the applicable areas as a group.
Funding Funded at 300K for the first year. May get $650K over the next two years. We need to show results at the end of the year, so it is good to know what the reviewers liked and did not like about our proposal.
Negative Reviews (NSF) Unfocused No high-performance computing or numerical simulations Real-time not explicitly defined Budget way too high No human-factors expertise No details of how the approach could solve the problem.
Reviewers liked these Develop sensor compensation techniques for faulty sensors Strong application focus on a complex domain Experience with disseminating systems and WSR-88D algorithms We seem to have been funded based on what we have done before, rather than on the merits of this particular proposal.
From the 6 th reviewer Extend their previous working system (WDSS) with the following features: integrating multiple sources of data learning in real-time, thus improving the prediction capabilities using statistics-based instead of heuristics-based decisions. Use of these methodologies for teaching purposes, as well as the dissemination of this software to other research laboratories and the creation of a common research tool
Also from the 6 th reviewer Could have been improved: the proposal seems to be an enumeration of different techniques, without any justification of why these methods have been chosen instead of other ones. detailed explanations are sometimes missing. My recommendation is to fund this proposal, but at a lower level than the one proposed by the investigators.
Tasks Three tasks: Vortex Detection Clustering and prediction Polarimetric Radar
Real-Time Classical: data periodicity (keep up with data). Hard to define for multi-sensor applications If you have a 3-radar domain, with a new elevation scan every 30 seconds, you get a new updated virtual volume on average every 10 seconds. Is periodicity 10 seconds? Lightning strikes are essentially asynchronous. Proposed: based on required lead-time Example: average lead-time for a tornado warning is 11 minutes. We could set as a goal, predicting tornadoes 20 minutes into the future. If we can do it with data from 30 minutes ago, then, we have 10 minutes to process data. Keep mind that the forecasts have to be continuous. We have to make runs once every 10 minutes.
Task 1: Vortex Detection At the end of this year, aim to have a vortex identification and prediction technique that: Uses data from multiple sensors Uses some novel data (more on this follows) Accomodates for faulty information Is capable of better skill than MDA/TDA Is capable of providing more lead-time to a forecaster. Decision Support System: provide forecaster with rationale for all suggested decisions.
Current MDA/TDA Mesocylone detection technique find 2D detections by analyzing azimuthal shear associate them based on rank and time into 3D circulation features if they meet some strength thresholds 3D circulations that meet depth, base and strength criteria are classified as mesocyclones.
Problems with current vortex algorithms Defined on radial velocity field. Single radar Simple use of radar reflectivity (>0 dBZ) Mesocyclone spatial extent based on radial velocity values, which are noisy How can we improve it?
Use of LLSD One promising source of data is a linear least- squares fit of radial velocity in the neighborhood of a gate. The size of the neighborhood depends on the range from the radar. Fit to a linear combination of azimuth and range Coefficient for azimuth is an estimate of azimuthal shear Coefficient for range is the divergence.
LLSD usage Azimuthal shear field
Boundaries Tornadoes frequently happen at the boundaries between air masses Not necessary Image shows dry- line boundary Image processing for boundaries to detect gust-fronts would be useful.
Input Sources The LLSD has never been used in vortex detection. Unlike the raw radial velocity, it can be combined from multiple radar. Also have satellite data from spatial domain Have national/region lightning data. The Near Storm Environment (RUC model) Still need to assimilate LLSD and reflectivity data from multiple radar in a fault-tolerant manner. (Can now do fault-tolerant time-based merges).
Learning Add a learning component Incorporate warnings issued by forecaster into the learning by the algorithm. Warnings can be faulty. Different forecasters have different skills. Therefore, this has to be achieved by the algorithm learning on the fly. Validate the algorithm against storm reports. The verification data is noisy. Have to come up with robust ways of doing this verification.
Data: status The WDSS-II system already ingests radar data from multiple radars and national/regional lightning data. Work is underway to ingest satellite data in real- time (archived cases can be done already). We have archived warnings and RUC data since April of this year. Currently testing process to compute LLSD at different scales. RUC model data needs to be ingested.
Discussion What kinds of techniques are appropriate for vortex detection? Multiple-sensor reflectivity, LLSD RUC model data (in Lambert projection) Multivariate analysis Gust-front detection
Task 2: Clustering and Prediction Currently there are two ways to identify storms: Heuristic threshold-based technique that operates on radial reflectivity field. Texture segmentation method. Once identified, the storms are predicted by: Matching centroids of storms identified and linear extrapolation Find motion estimate by minimizing mean absolute error on actual field. Then, forecast.
SCIT / kmeans The centroid and threshold-based technique called SCIT (storm cell identification and tracking) is used on the WSR-88D. The texture segmentation and error-field minimization technique is being worked on. I will show the results from the second technique because the first technique predicts only centroid location. (We want to do field forecasts).
Kmeans
These clusters are actually found at different scales. The clusters are used as the domain within which the error minimization done (the kernel that is moved around in the previous frame). And using these, a motion estimate (“wind field”) is obtained at different scales.
Motion, Prediction
Performance Compared to a persistence forecast. Skill at predicting the location of 30dBZ or higher values. Clutter at the end of sequence. (Random data are assigned motion estimates)
Ideas for future work Drawbacks with current approach: Operates on radar or on satellite, not on both. Can not handle faulty data (as with clutter) Use multiple inputs in deriving motion estimates: Storm core movement (as the technique does) Dual-doppler wind field retreival (?) Wind-field estimates from mesoscale model (RUC)
Discussion Why go through wind-field estimate (and not directly to a forecast)? To allow forecast of fields other than the input. Physically reasonable assimilation. Better ways of identifying storms. Better ways of predicting location and values (field forecast).
Task 3: Polarimetric Radar Algorithms Essentially open field for research. Currently only one AI algorithm: a hydrometeor classification algorithm. Low-hanging fruit: a hail-size estimation technique.
Hail Size Estimation Currently done on Doppler radar (algorithm to compute field of hail size estimates in WDSS-II already). High reflectivity data aloft are assumed to produce hail. Polarimetric radar provides way of identifying hail near the surface (via aspect ratio). Come up with way to estimate hail size.
Learning Train the technique on actual hail reports (which are noisy). Problems with polarimetric radar include calibration errors. Techniques have to account for this. Use the polarimetric hail-size estimation technique to improve the predicted hail-size from the Doppler-based method.
Contacts People at CIMMS/NSSL who can advise on each of these tasks: Vortex Detection: Greg Stumpf, Travis Smith ● ● Clustering/Prediction: V Lakshmanan, Bob Rabin ● ● Polarimetric Radar: Terry Schuur ●