Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases and Global Environmental Change: Information Technology for Sustainable Development Gilberto Câmara INPE, Instituto Nacional de Pesquisas Espaciais.

Similar presentations


Presentation on theme: "Databases and Global Environmental Change: Information Technology for Sustainable Development Gilberto Câmara INPE, Instituto Nacional de Pesquisas Espaciais."— Presentation transcript:

1 Databases and Global Environmental Change: Information Technology for Sustainable Development Gilberto Câmara INPE, Instituto Nacional de Pesquisas Espaciais Brazilian Academy of Sciences, Annual Meeting, May 2012

2 source: IGBP How is the Earth’s environment changing, and what are the consequences for human civilization? The fundamental question of our time

3 Global Change Where are changes taking place? How much change is happening? Who is being impacted by the change?

4 Limits for Models source: John Barrow (after David Ruelle) Complexity of the phenomenon Uncertainty on basic equations Solar System Dynamics Meteorology Chemical Reactions Hydrological Models Particle Physics Quantum Gravity Living Systems Global Change Social and Economic Systems

5 Limits for Models source: John Barrow (after David Ruelle) Complexity of the phenomenon Uncertainty on basic equations Solar System Dynamics Meteorology Chemical Reactions Hydrological Models Particle Physics Quantum Gravity Living Systems Global Change Social and Economic Systems e-science

6 Collaborative e-science Territory (Geography) Money (Economy) Culture (Antropology) Modelling (IT) Connect expertise from different fields Make the different conceptions explicit

7 Até 10% 10 - 20% 20 – 30% 30 – 40% 40 – 50% 50 – 60% 60 – 70% 70 – 80% 80 – 90% 90 – 100% Amazonia (4.000.000 km2 = size of Europe) Deforestation in Amazonia

8 Data (we need a lot of it) Deforestation in Brazilian Amazonia (1988-2011) dropped from 27,000 km 2 to 6,200 km 2

9 Daily warnings of newly deforested large areas Real-time Deforestation Monitoring

10 166-112 116-113 116-112 30 Tb of data 500.000 lines of code 150 man/years of software dev 200 man/years of interpreters How much it takes to survey Amazonia?

11 166-112 116-113 116-112 TerraAmazon – open source software for large-scale land change monitoring Spatial database (PostgreSQL with vectors and images) 2004-2008: 5 million polygons, 500 GB images

12 Terrestrial Airborne Near- Space LEO/MEO Commercial Satellites and Manned Spacecraft Far- Space L1/HEO/GEO TDRSS & Commercial Satellites Deployable Permanent Forecasts & Predictions Aircraft/Balloon Event Tracking and Campaigns User Community Vantage Points Capabilities Welcome to the Age of Data-intensive Science!

13 Weather and climate source: WMO 11,000 land stations (3000 automated) 900 radiosondes, 3000 aircraft 6000 ships, 1300 buoys 5 polar, 6 geostationary satellites

14 ARGOS Data Collection System (16000 plats) 650,000 messages processed daily

15 Argo bouy network

16 Data chain in Earth System Science fonte: NASA

17 Data-intensive Science = principles and applications of information technology for handling very large data sets

18 IT concepts are essential to global change researchers (but most of them don’t know it) Global change challenges will motivate new research in IT (but most of us are not looking there) Conjectures

19 Which data is out there? How to organize big data? How to get the data I need? Challenges for data-intensive science How to model big data? How to access and use big data?

20 Stage 1 – A scientist’s personal database Local database User interface Database creationAnalysisDatabase access

21 Stage 1 – A scientist’s personal database Local database User interface Database creationAnalysisDatabase access The good: data is close to you (or so you think) The bad: no long-term data preservation no data sharing

22 Stage 2 – A scientific lab database Corporate database User interface Database creation AnalysisDatabase access

23 Stage 2 – A scientific lab database Corporate database User interface Database creation AnalysisDatabase access The good: long-term data preservation data sharing inside the lab reusable corporate software The bad: substantial costs on data admin little outside data sharing

24 ECMWF Metview – MOPTC June 2004 - 24 Metview

25 ECMWF Metview – MOPTC June 2004 - 25 Field plotting

26 Stage 3 – A scientific lab database in the cloud Corporate database User interface Database creation AnalysisDatabase access

27 Stage 3 – A scientific lab database in the cloud Corporate database User interface Database creation AnalysisDatabase access The good: long-term data preservation shared costs on data admin The bad: rewrite software for cloud processing outside data sharing still not solved

28 Risk Analysis Analysis

29 On-line data feed ModelsSatellite/RadarDCP Rain total Fixed time and irregular – alert Point data One file per DCP Grid 4km Total rain 1h Total rain 24h Current (mm/h) Binary file ETA 40, 20, 5 Km Ensemble 40 Km Total rain 72h 72 files ASCII grid file

30 TerraMA 2 - Natural Disasters Monitoring and Alert System

31 Stage 4 – Multidatabase access Data source Data source Data source Modelling Data discoveryData accessAnalysis Remote Analysis

32 Stage 4 – Multidatabase access Data source Data source Data source Modelling Data discoveryData accessAnalysis Remote Analysis The good: long-term data preservation shared costs on data admin access to large external database The bad: rewrite software for cloud processing finding data is a major problem

33 Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte?

34 Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte? You don’t! Move the software to the archive

35 Scientific Data Management in the Coming Decade (Jim Gray, 2005) Next-generation science instruments and simulations will produce peta-scale datasets. Such peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. The procedural stream-of- bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Database systems will be judged by their support of common metadata standards and by their ability to manage and access peta-scale datasets.

36 36 Virtual Observatory If data is online, internet is the world ’ s best telescope Scientific Data Management in the Coming Decade (Jim Gray)

37 Where is scientific database going?

38 From tables to arrays nomeCPF cargo SQL language selection, projection, join, relation (table) SELECT * FROM images WHERE date=“today ” relational algebra SELECT Mean (A.B) FROM Array A AQL language Spatial queries, Math operations Scientific data Array Algebra

39 Communicating concepts is hard Image source: WMO vulnerability? climate change? poverty?

40 degradation We’re bad at representing meaning deforestation? degradation? disturbance? Communicating concepts is hard

41 When did the Aral Sea reach the tipping point? Communicating change is very hard

42 Describing events and processes is very hard When did the flood occur?

43 Earth System Science data management poses a major challenge for the database community We need new techniques, architectures and data handling techniques to deal with scientific data Conclusions


Download ppt "Databases and Global Environmental Change: Information Technology for Sustainable Development Gilberto Câmara INPE, Instituto Nacional de Pesquisas Espaciais."

Similar presentations


Ads by Google