Download presentation
Presentation is loading. Please wait.
Published byCorey Logan Modified over 8 years ago
1
Pasted from <http://www.mckinsey.com/mgi/our-researchhttp://www.mckinsey.com/mgi/our-research Big data What’s GAW got to do with it … Jörg Klausen Chair ET-WDC CAS EPAC SSC Meeting, 15-17 March 2016 WMO, Geneva
2
2 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Outline «Big data» «GAW data» «Science as a service» Conclusion
3
3 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva BIG DATA
4
4 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Evolution of term «big data» Fig. 1. Frequency distribution of documents containing the term “big data” in ProQuest Research Library. Source: Amir Gandomi, Murtaza Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management, Volume 35, Issue 2, 2015, 137–144, http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007
5
5 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Facts about «big data» Google manages >1 million PB and processes >24 PB of data every day (a lot more than all printed material in the U.S. Library of Congress.) >1 billion Google searches are conducted every day >250 billion email communication happens every day. YouTube has >1 billion unique visitors per month >6 billion hrs of video watched per month on YouTube (~1 hour for every person on Earth and 50% more than in 2014) 90% of the data in the world today has been created in the past 2 years. Data are forecast to double every 2 years until 2020. In 2020, the amount of digital data produced will exceed 40 zettabytes (5,200 GB for every homo sapiens on Earth) adapted from
6
6 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva What is «big data»? Big data is high volume, high velocity, high variability, low veracity (reliability), high value data (e.g., Dunbill (2012), https://www.oreilly.com/ideas/what-is-big-data ) Big data is largely «unstructured» data
7
7 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Handling «big data» Facial recognition technologies (for customer profiling) «Clickstream» analysis (for web sites) Data mining from mobile devices Fig. 3. Processes for extracting insights from big data. Source: Amir Gandomi, Murtaza Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management, Volume 35, Issue 2, 2015, 137–144, http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007
8
8 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Techniques for analyzing «big data» Text analytics (text mining) Information extraction, text summarization Question answering («Siri», «Watson») Sentiment analysis (opinion mining) Audio, Video and Social media analytics Facial recognition technologies (for customer profiling) «Clickstream» analysis (for web sites) Image and pattern recognition Predictive analytics Uncover patterns and capture relationships in data Source: Amir Gandomi, Murtaza Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management, Volume 35, Issue 2, 2015, 137–144, http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007
9
9 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva GAW DATA
10
10 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Traditional sources Land-based in-situ and remote-sensing observations Balloon-borne in-situ and aircraft observations Satellite observations
11
11 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva New sources Potentially (a lot) more land-based in-situ and remote-sensing observations «GAW Local» stations «Citizen scientists» operating private stations Mobile devices Sensors mounted on vehicles «Personal health»-related sensors UAV networks Managed telecom balloon networks («Google Loon»)? Facebook statements, Tweets, WhatsApp messages, Instagram photos, … ?
12
12 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva New Sources: Examples
13
13 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Governance and data curation Past and present Under the auspices of NMHSs, other governmental organizations or academic institutions Curation of data by GAW World Data Centers, and other international and national or program-specific data centres Future Under the auspices of EPAs, local governments, private companies, academic institutions Curation of data by national or program-specific data centres, private companies, non-profit organizations, “Google”
14
14 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Are «GAW data» «Big data»? Some aspects of «big data» (volume, velocity, veracity) Still expect data to be well-structured (low variability) Existing approaches for data management and analysis need to be propped up, but concepts remain viable AspectTraditional sourcesNew sources VolumeRelatively small (except satellite data) Growing, (potentially) huge VelocityMost data have high latency Most data in n.r.t. VariabilityWell-structured data Veracity (reliability)(Normally) high(Often) unknown
15
15 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva «SCIENCE AS A SERVICE»
16
16 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva NextGAW Standardize observing techniques Provide data quality objectives Standardize metadata formats Standardize data formats Understand observations Combine observations & models Develop products Standardize discovery, access and retrieval Standardize exchange formats
17
17 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Earth observation initiatives UK Natural Environment Research Council-funded Environmental Virtual Observatory pilot (EVOp) project Earth Cube initiative of the US National Science Foundation Global Earth Observation System of Systems «NextGEOSS» proposal
18
18 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva “GAW” Data: Current Situation (still… WOUDC Ozone/UV WDCPC Precip Chem WRDC Radiation WDCA Aerosols WDC-RSAT Satellites WDCGG Gases AERONET AGAGE BSRN CapMon CDIAC EANET EBAS (NILU) GALION (Earlinet, …) NADP NOAA/ESRL/GMD RAMCES SHADOZ SKYNET TCCON (CalTech) [One for each satellite] … 6 WDCs 6 different (meta)data formats 1 data policy >15 other archives many different (meta)data formats several different data policies How can we serve our users better? Partial integration through GAWSIS >20 ways to submit data
19
19 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Federated GAW data architecture (I) Providers of “GAW” data Providers of “GAW” data Users of “GAW” data Users of “GAW” data Data + Metadata Data + Products + Metadata
20
20 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Federated GAW data architecture (II) GAWSIS query for data web service request WDC CDC WDC CDC WDC CDC WDC CDC WDC CDC WDC CDC 3 serve metadata 4 5 6 retrieve data + metadata Inst n.r.t. data submission delayed mode data submission 1 2 operator user ? Modeling data?
21
21 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva 1.Migrate GAWSIS from Empa to MeteoSwiss, integration with OSCAR/Surface, by mid-April 2016 2.Extend GAWSIS API to be compliant with WIGOS Metadata Standard (WMDS) Draft specification for OGC-compliant XML schema, by mid-April ‘16 Review by WMO expert teams, by mid-May ‘16 3.Test API in context of OSCAR/Surface Pilot projects with DWD, MeteoSwiss, BoM?, UKMO? 4.Connect GAW WDCs First, use existing sources, by mid-July ‘16 Later, use GAWSIS API (requires changes at WDCs), by mid ‘17 5.Connect GAW Contributing Data Centers (GAW CDCs), by end ‘17 Tentative road map for metadata
22
22 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
23
23 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva 1.ET-WDC + GAW CDC managers to agree on data exchange specification, by mid 2017 2.Implement web services at WDCs, CDCs to make data available in this format (amongst others); alternatively, implement a central harvester and pre-processor web service Test beds at WDCA, WDC-RSAT, WOUDC, by mid 2018 Adoption by WDCGG, WRDC, CDCs asap Tentative road map for data
24
24 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Realization? Co-funding through NextGEOSS? Additional requirements from GEOSS Decision expected by July Co-funding through WMO resource mobilization department? Co-funding through OSCAR/Surface extension? Co-funding through WDCs, CDCs? Co-funding through GAW?
25
25 J. Klausen | Big data: What’s GAW got to do with it … 16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva Conclusions «Big data» may be coming to GAW, but … «GAW data» are not «big data» according to 4V definition «Big data» is coming, but … GAW won’t be the owner of these «big data» GAW won’t benefit nor be harmed necessarily GAW can serve as a reference network Vision of «federated GAW data infrastructure» agreed at Zurich workshop in 2015 Potential of a federated GAW data infrastructure probably larger than potential of «big data» in the foreseeable future SSC needs to endorse, defend strategy, help mobilize resources.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.