1 Modeling Evolution in Spatial Datasets Paul Amalaman 2/17/2012 Dr Eick Christoph Nouhad Rizk Zechun Cao Sujing Wang Data Mining and Machine Learning Lab Team Members Anirup Dutta Swati Goyal Tarikul Islam Paul Amalaman
I- Background II-Research Goals III-Case Study IV-Summary 2
Machine Learning Techniques are mostly used where modeling implicit trends is possible (Regression) stable patterns exist in dataset (Classification) Simulation Systems are used when a model is hard to establish there is a great degree of randomness in the attribute values there are a lot of interactions between objects when attributes have to be predicted recursively over many steps Example Applications of Simulation Systems: Traffic Modeling, Weather Forecasting, Social Networks, Urban Modeling 3 I-Background
I-Background continued(3) Spatial Simulation Systems Cellular Automata (CA) (Cell centered approach) Continuous Agent Space Or Multi Agent System (MAS) (Agent centered approach) ABM 4
Concept of neighborhood Moore Neighborhood Von Newman neighborhood Moore Neighborhood Von Newman Neighborhood 5 D(x-1,y-1)D(x-1,y)D(x+1,y-1) D(x-1,y)P(x,y)D(x+1,y) D(x-1,y+1) D(x+1,y+1) D(x-1,y) P(x,y)D(x+1,y) D(x-1,y+1) I-Background continued(3) Modeling with Cellular Automata
I-Background continued(4) Modeling with Cellular Automata Cellular Automata provides the programmer a cell-centered programming style where the set of cells represents computing units that are regularly organized good efficiency with parallel architecture 6
II-Research Goals Using Data Mining and Machine Learning Techniques to Enhance Simulation Systems New approach= Machine Learning Techniques + Spatial Simulation Systems Goal1: Grid-based Models for Progression in Spatial Datasets Goal2: Development of Cluster-based Bias Removal Methods 7
8 ? y i,j,t+1 = f ij (x 1,1,1,t,…, x 1,n,n,t,…, x m,1,1,t,…, x m,n,n,t, y 1,1,t,…,y,n,n,t ) II-Research Goal continued (1) Goal1:Grid-based Models for Progression in Spatial Datasets t t +1 X1(t) X2(t). Xn(t) Y(t) X1(t+Δt)=? X2(t+Δt)=?. Xn(t+Δt)=? Y(t+Δt)=? Given that at t we know all the attribute values including the output variable Y, can we predict all attribute values at t+1? Challenges: 1. Many target variables to predict; different variables have to be predicted at different location 2. Target variables are not independent of each other (e.g. some are auto-correlated) 3. Models has to be used over multiple steps
EPA prediction models are meteorological and chemical transport models. Those models are derived from solving differential equations. Over time, the model bias grows larger 9 II-Research Goal continued (2) Goal2:Development of Cluster-based Bias Removal Methods Model Output + bias b(x) Input x Whether pattern recognition Model Output Correction (bias removal) Input x Output h(b(x), group(x)) Bias removal based on whether pattern recognition Our model, model h learn group(x), and b(x) and make better prediction b(x) group(x)
III-Case Study Improving Ozone Forecasting For Houston- Galveston Area Goal1: Development of a Grid-based Prediction Framework Goal2: Development of Cluster-based Bias Removal Methods In Collaboration with UH-IMAQS Institute for Multidimensional Air Quality Studies (UH Department of Earth and Atmospheric Science) -Dr Rappenglueck, Bernhard -Dr Li, Xiangshang 10
III-Case Study Continued(1) Ozone Prediction Goal 1:Improving Prediction for Spatial Progression Given what happened at t, can we predict what happens at t+Δ, t+2Δ,..? 11
Goal 2- Improving forecast Accuracy 12 III-Case Study Continued(2) Ozone Prediction
III-Case Study Continued(2) Status of Dissertation Methods to collect ozone data and to capture it in a relational database have been developed. The necessary knowledge for simulation- based prediction systems in general, and ozone prediction in particular has been obtained Started work on different modeling approaches for grid-based prediction 13
IV-SUMMARY 14
Thank you! 15