Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University, St. Louis, Missouri Brooke Hemming US EPA/National Center for Environmental Assessment Research Triangle Park, North Carolina Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University, St. Louis, Missouri Brooke Hemming US EPA/National Center for Environmental Assessment Research Triangle Park, North Carolina Networked Environmental Information System for Global Emissions Inventories ( NEISGEI ) Applying 21 st Century Advances in IT Technology to the Conversion of Air Quality Data into Scientific-, Management- and Policy-Relevant Knowledge
A conceptual framework for the development of a fully integrated, distributed emissions inventory Tie together data at all spatial (and temporal) scales Allow merging and manipulation of any and all web-based, along with your own, air quality-relevant data Serve the entire air quality community: scientists, regulators, policy analysts and the public Networked Environmental Information System for Global Emissions Inventories NEISGEI
A tool for strategic planning of air quality environmental management capacity building projects A fully populated network will be a resource for identifying important missing datasets in regional, hemispheric and global scale studies
list of datasets Find/Discover Data AQ data from many distributed and heterogeneous sources MiddlewareUser These data sets meet your criteria metadata 1) Manger asks for data for specific time range and location range 2) The mediator interprets the query and identifies the relevant datasets I need summer, 2002, air quality data for California M E D I A T O R translate request M E D I A T O R send results
Access Data Great! Data sets 1 & 3 are what I need. I would like to view that data Here is the data in a format compatible with your software. dataset data 3) Manager selects subset of interest 4) Wrappers translate the data into a standard format M E D I A T O R translate request Data MiddlewareUser W R A P P E R translate request
Time Series Map View Data 5) Manager would like to see the data in maps and charts Data User T I M E V I E W display data M A P V I E W display data Middleware Viewed in Browser
mapped data fields Map Fields in Data Sets mapper settings Ratio op. settings Ratio 2:3 data set 6) Manager uses the data mapper to automatically create relationships among heterogeneous dataset field names A U T O D A T A M A P P E R map fields from multiple datasets R A T I O O P E R A T O R calculate ratio 7) Using generated field map, manager calculates the ratio between parameter x from data set 2 and parameter y from data set 3
Report Report gen. settings Create Summary Report R E P O R T G E N E R A T O R calculate ratio 8) Manager generates summary reports based on calculated values 2:3 Ratio Other Data
Network concept introduced at the NSF Digital Government Research Conference 2002 NSF/EPA Workshop Issues identified: Finding data Integrating data Quantifying data uncertainty Resulting projects: CAREN: A CARB-California AQMDs Network (NSF/CARB/EPA) – data wrapping and integration of same type data Integrated North America Emissions Inventory (CEC) Fire, Smoke and Air Quality Network (NSF/EPA/USDA) – data wrapping and integration of heterogeneous data plus tools to support environmental management NEISGEI Networked Environmental Information System for Global Emissions Inventories
CAREN: The California Air Resources Network US EPA RPOs AQMDsMunicipalitiesTribesStates ??? Automating the integration of heterogeneous databases: Government information should be timely, thorough, and accurate. But government agencies often do not share data effectively with each other or the public Barrier: Technological incompatibilities Barrier: Regulatory, organizational and financial barriers Barrier: Fear of litigation due to inappropriate disclosure Eduard Hovy, Jose-Luis Ambite, Andrew Philpot USC Information Sciences Institute
Previous Work: Energy Data Collection Employ as reference the Omega ontology: 120,000-term general purpose concept hierarchy Augment Omega with domain-specific metadata describing energy data series and source characteristics Use artificial intelligence query planner to provide uniform access to relational and web-based information sources Successful in incorporating 50,000 data series from six heterogeneous data sources from three different agencies, using semi-automated mappings Significant manual effort required. Conclusion: More general methods needed! BLSCEC EIA (NSF Digital Government Program 1999-present)
Machine Translation-inspired Induction for Data Mapping declared or detected metadata: e.g., field names, database schema, table headers, footnotes learned data patterns: e.g., domain, range, formats, orthography topological relationships: e.g., foreign key/subset discovery terminological reference via ontology or thesaurus FR : Il y a un crayon jaune sur une grande chaise. EN : There is a yellow pencil under a big chair. } } DB1 : Smith, John, 2000 High St, Columbus, OH DB2 : Ohio, Franklin, Smith, 43201, 1108 } } R ecent advances in Machine Translation (MT) have allowed the automatic induction of cross- natural language correspondences from large multi-lingual corpora O ur system will use these techniques to learn cross-database correspondences, based on features such as: parallel French/English sentences (e.g., from Canadian Parliamentary Records) two databases denoting correlating records Emissions inventory databases from the municipal up to hemispheric scales will be integrated into the network automatically using this new technology.
Integrated N. American Emission Inventory The Commission on Environmental Cooperation (CEC) and the US EPA are supporting a project to develop a prototype web tool for enabling uniform access to distributed emissions data from North American electricity generating power plants. Co-investigator: Greg Stella, Alpine Geophysics The prototype tool will help: Assess data gaps Identify future IT tools that can aid collaborative emissions inventory project Air pollutant emission inventories for the US, Canada, and Mexico are compiled and stored using different methods
Fire, Smoke and Air Quality Network The US Environmental Protection Agency and USDA-Forest Service are partnering agencies The management of fire, smoke, and air quality is tasked to multiple agencies at federal, state, and local levels. The diversity in data collection methods, data reporting requirements, data formatting schemes, data analysis methods, and data presentation create a daunting challenge for the integration of these data. However, integration of these heterogeneous datasets is precisely what is called for by federal and regional organizations in order to derive a more comprehensive understanding of fire, smoke, and air quality. Co-investigator: Rudolf Husar, Washington U.
Fire, Smoke and Air Quality Network uniform access to and cataloging of distributed fire related data and tools easy-to-use interfaces for exploring fire related resources powerful tools that contribute to fire related data analysis and modeling a framework that encourages community-wide contributions The fire, smoke, and air quality network will consist of web-based data access and analysis facilities that are flexible and adaptive in meeting the diverse end use requirements of wildland and prescribed fire managers and air quality planners. The network will provide:
Fire, Smoke, and Air Quality Network The map and time views are linked so that changing the focus in one automatically updates the other. For example, clicking on a PM 2.5 monitor in the map displays the time series at that monitor. CIRA ColoState-VIEWS European Space Agency Integration of multiple sources of fire-related data aids in planning, management, and post-fire analysis. Time View Control Panel Generic Map Server NASA SeaWiFS Project Map View
Spatio-Temporal Data Browser Render Spatial Slice Find/Bind Data Time Slice Time Portrayal Spatial PortrayalSpatial Overlay Time Overlay OGC-Compliant GIS Services Time-Series Services PortrayOverlay Homogenizer Catalog Wrapper Mediator Client Browser Cursor/Controller Maintain Data GIS Data Vector XDim Data SQL Table OLAP Satellite Images Data Sources Queries yield slices along the spatial, temporal and parameter dimensions of multidimensional data cubes. Data Cube
What’s possible with NEISGEI ? NetworkedAssessment EvnNetworkedAssessment Evn In the news: “The Bush administration is to hold an Earth Observation Summit in Washington this summer to which it hopes the G8 group of industrialised countries will send cabinet-level representatives. The US is to urge the world's governments to set up an "integrated Earth observation system" to "take the pulse of the planet". It would combine satellite and ground-based observations of weather, climate, vegetation and other environmental indicators.” ~~Financial Times (London), Friday Jun NEISGEI will make possible: Simplified, multi-party, cross-border collaboration in air quality management Simplified development of environmental indicators, with the inclusion of data available on other environmental media