US-EU Research Cooperation Interagency/International Cooperation on Ecoinformatics September 2004 Bruce Bargmeyer +1 (510) Interagency/International Cooperation on Ecoinformatics Brussels, Belgium
2 Ecoinformatics Information science and information technology for the environment F Sound information as the basis for environmental policy, decisions, and action F Information technology that supports and enables development of sound information F Facilitate interaction with the information u Human – Computer u Computer - Computer
3 Ecoinformatics F What are the key elements needed for an ecoinformatics marketplace? F What actions should this Interagency/ International Cooperation on Ecoinformatics take? F How can the I/ICE contribute to and draw on R&D programs of NSF, & EU DGs?
4 Past, Present, … Future? Lots of users Lots of information systems Lots of Data Sources Users EEA DOE DoD EPA environ agriculture climate human health industry tourism soil water air textdata environ agriculture climate human health industry tourism soil water air text ambiente agricultura tiempo salud hunano industria turismo tierra agua aero textdata environ agriculture climate human health industry tourism soil water air textdata Others... ambiente agricultura tiempo salud huno industria turismo tierra agua aero textdata
5 Actions F Much is already being done on environmental & health information u Billions are being spent on data, systems, analysis u Millions are being spent on information technology u Millions are being spent on standards u Millions being spent on semantics development and data harmonization F We can have great influence in bringing coherence to these expenditures/efforts with a tiny fraction of these funds.
6 Data Standards F Avoid a combinatorial explosion of data content, description, and metadata arrangements for information access and exchange. Data standards and metadata registries can help.
7 State Laws CAA CWA RCRA TSCA “ State Regs Fed Air Reg Fed Water Reg Fed RCRA Reg Fed TSCA Reg “ Separate Data Repositories Regulated Facility Separate Regs/ Procedures Separate Environmental Media Legislation Then there is one point of access to our environmental data resources: Complete Warehouse Repository Regulated Facility Public/ Environmental Regulators/ Environmental Community June 1996
8 Data and Semantics Management Dictionary Keyword Ontology Terms Terms Data Elements Thesaurus DBMS/XML/ Documents Semantic Web Concepts
9 Possible Actions F Identify and develop ecoinformatics key elements u Lead semantics development efforts and provide semantic services u Lead standards efforts n E.g., for Reportnet, the Exchange Network, GBIF F Lead adaptation/adoption of emerging technology u Environmental semantic grid u Environmental data grid u Environmental computational grid n Hardware cycles n Software/models
Metadata Registries Companies Universities Agencies Data Services Semantic Services Others Users September 2004 Environmental Data Grid Environmental Computer Grid High Performance, cluster, Personal Environmental Semantics Grid Terminology Thesaurus Ontology Taxonomy Structured Metadata Computation Services Software: Models, Visualization, Analysis Agent systems Semantic Based Computing Data Standards
11 Metadata Registries Companies Universities Agencies Data Services Semantic Services Others Users September 2004 Environmental Data Grid Environmental Computer Grid High Performance, cluster, Personal Environmental Semantics Grid Terminology Thesaurus Ontology Taxonomy Structured Metadata Computation Services Software: Models, Visualization, Analysis Agent systems Semantic Based Computing Data Standards
12 A Possible Collaborative Project F Initiate an interconnected EU-US: u Environmental Data Grid u Environmental Computation Grid u Environmental Semantics Grid F Organize key infrastructure components for demonstration u E.g., EDR, EPA supercomputer, models u DOE/LBNL supercomputer (under DOE-EPA MOU) u XMDR Semantics Server u Interagency semantic and data resources e.g., GBIF F Hold competition for innovative use of the Grids. F Organize conference(s) F Funding: $50m over three years F Seed money for organizing $300k each in US in EU F Possible funding for workshop
13 US – EU Collaboration F US u NSF u EPA and I/ICE partners F EU u DG Environment, DG Research, DG Information Society u EEA F R&D Lead and Project Central u US – LBNL u EU – JRC F Private firms
14 Application Areas F Biodiversity F Climate F Genomics F computational toxicology F Spatial data – GEO/GEOS & GMES F Ecological Modeling F Security, ecological risk management F Invasive species-industrial costs
15 Related Efforts F Semantic Environment for Ecological Knowledge (SEEK) F Knowledge Network for Biocomplexity (KNB) F DataGrid Project (FP5) F Environmental Cyber Infrastructure F DOE Science Grid F Cancer Biomedical Informatics Grid (caBIG)
16 Example Tasks F Production Grid Services u Develop the components necessary to use Grids in production. Define Grid-capable Web services standards, develop concrete implementations of Grid services, participate in interoperability testing between different Grid service implementations, and implement production Grids (see: DOE Production Science Grid). F Grid Services Architecture u Develop an architecture which supports dynamic, programmatic access to local and remote data sources and metadata repositories without sacrificing the high-performance requirement. E.g., address this problem through a peer-to-peer service infrastructure which supports queries across autonomous data repositories, including dynamic and heterogeneous information sources. F Security/Authentication/Access Control u Develop secure grid technologies e.g., authorization services for heterogeneous, widely distributed resources that require a combination of local and centralized access control. Apply these principles of policy-based access control both to group query protocols and to individual peer-to-peer data sharing responses.
17 Example Tasks (Cont) F Grid Workflow u Many scientific projects have developed workflows which are regularly used in their research. Develop a graphical user interface for composition and monitoring of Grid-based workflows. E.g., enable scientists to design workflow networks by “dragging and dropping” existing Grid service components and monitoring the resulting workflow execution visually. Enable end users to submit complex queries through a friendly Web interface, while also allowing the production Grid managers to execute and monitor regularly scheduled complex, compute- and data-intensive production workflows. F Collaboration Technologies u Distributed scientific collaborations need tools to support development of topical communities of researchers working together regardless of actual physical location. E.g, Pervasive Collaborative Computing Environment (PCCE). F Semantic Grids u A Semantic Grid adapts the Semantic Web and metadata registries to the service-oriented architecture being used to develop next-generation Grid services.
18 Some Precedents F Cancer Bioinformatics Grid (caBIG) – expected funding $20m each year for three years u Includes the caDSR (a ISO/IEC metadata registry like the EDR) F DOE Competition for Supercomputer time
19 Organizational Meeting F Berkeley, California F January 18 & 19, 2005 (Tue & Wed) F Host: LBNL F Location UC Berkeley Campus F Attendance: 25 or less
20 Workshop: Statement of Purpose E.g., F Environmental science involves many collaborators at multiple institutions. The leading edge of science depends critically on an infrastructure that supports widely distributed computing, data, and instrument resources. An Environmental Science Grid is being developed and deployed across the Environmental organizations, providing infrastructure services for advanced scientific applications and problem solving frameworks. By reducing barriers to the use of remote resources, it is deploying the cyber infrastructure required for the next generation of science. F The goal of the Environmental Science Grid project is to provide this advanced cyber infrastructure as persistent, scalable, community standards based, Grid services to support environmental science projects. Grid services provide security, resource discovery, resource access, monitoring, data access, tools for advanced scientific applications and problem solving frameworks. These services reduce barriers to the use of remote resources and facilitate large-scale collaboration.