Download presentation
Presentation is loading. Please wait.
Published byStuart Hubbard Modified over 9 years ago
1
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for RDA Climate Change Data Challenge September 28, 2015 1
2
NSF Graduate Data Science Workshop & Community Building, August 5-7, Seattle This NSF sponsored 2.5 day workshop on August 5th – 7th on the University of Washington, Seattle campus, will bring together 100 graduate students from diverse domain sciences and engineering with Data Scientists from industry and academia to discuss and collaborate on Big Data / Data Science challenges. In addition to keynote presentations from high profile speakers, the participants will present posters covering their own research and work collaboratively to begin to solve some of the Grand Challenge problems facing Data Enabled Science & Engineering disciplines. After the workshop, the output from the collaborative teams will be published in an open access environment. Through the shared work at the workshop and beyond, the participants will form lasting, collaborative relationships with their peers and the senior academia partners and industry participants including those from Amazon, Google and Microsoft. The workshop Grand Challenge topics will be selected from the highest scoring white paper submissions. During the workshop, attendees will form teams to work on the Grand Challenges. The authors of the very highest scoring white papers will be invited to give lightning talks of a few slides during the plenary session to describe their challenges or methods. http://depts.washington.edu/dswkshp/ 2
3
Purpose I think we will do a meetup (or series of meetups like this) to support the NSF Data Science / Big Data Community and use the RDA Climate Change Data Challenge, climate.data.gov, and the U.S. Climate Resilience Toolkit data sets, I am preparing, to jump start our meetup members and other data science meetup participants. Data Sets: RDA Climate Data Challenge: Only 17 of 64 could be used so far. NTRD: 36 Shape (problem reading largest file). Climate.Data.gov: 16 of 38 used so far. U.S. Climate Resilience Toolkit: 63 data sets used in 80 Case Studies. Using Climate Data, Satellite Imagery, and Local Knowledge to Prevent Famine uses 6 data sets (the maximum for any case study), so this would be the best one for integrating multiple data sets. National Climate Assessment: 2377 data sets, in addition to the 36 data tables I extracted from the report itself. See: SpreadsheetSpreadsheet 3
4
Data Science for RDA Climate Change Data Challenge and Meetup Goal 1: Digital Catalog - Done Goal 2: Data Audit - Done Goal 3: Individual Data Sets in Spotfire – Done (RDA and NTRD) Goal 4: Integration/Applications – IN PROCESS (See right box) Goal 5: Meetups/Data Science Publication/MOOCs – IN PROCESS (See right Box) An additional goal, is to integrate the climate.data.gov and the U.S. Climate Resilience Toolkit into one “seamless” system, which we will call "a Data Science Data Publication". This will be my challenge submission and experimentation day demo for the 6th RDA Plenary in Paris on September 23-25, and support the NSF Meetup of Data Science Meetups on November 6-7 in Washington DC. Our Meetup of Data Science Meetups in preparation for the November 6-7th Meetup is tentatively planned for September 28th. 4
5
NSF Big Data Hubs and Data Science Meetups Initial Schedule: Data Science Call I: June 12, 2015 Data Science Call II: June 18, 2015 In-person Meetup Workshop: Washington, DC November 6-7, 2015 Big Data Regional Innovation Hubs (Accelerating the Big Data Innovation Ecosystem): Midwest Northeast South West Initial Ideas: Data Science YouTube channel or Podcast Angie's List for Data Scientists Gathering groups working around the same domain. I.e. connecting people doing different climate global challenges Groups Participating: Bayes Impact San Francisco, CA Non-profit Big Data Utah Salt Lake City, Utah Collaboration Boston Predictive Analytics Boston, MA Meetup Data Community DC Washington, DC Meetup Data Science ATL Atlanta, GA Meetup Data Science for Social Good Chicago, IL Fellowship Program DataKind New York, NY Nonprofit District Data Lab Washington, DC Meetup NYC Data Science New York, NY Meetup SF Data Mining San Francisco, CA Meetup Data Science Chicago Chicago, IL Meetup Data Science MD Baltimore, MD Meetup U.S. Ignite Nation-wide Communities Non-profit Analytics Club Boston, MA Meetup Data Science for Social Good Atlanta Atlanta, GA Fellowship Program https://bdhub.info/http://data-science.meetup.com/ 5
6
6 My Note: Start to Join and Invite Them to the September 28 th Meetup.
7
Data Mining Data.gov and U.S. Climate Resilience Toolkit Themes Data Resources Challenges FAQ Contact Climate Other? Get Started Taking Action Tools Topics Expertise About Contact Funding Opportunities FAQ 7 http://www.data.gov/climate/http://toolkit.climate.gov/
8
http://www.data.gov/climate/ 8
9
9 Spreadsheet My Note: Requested and received spreadsheet of 547 data sets and all 100,000+ data sets so I can integrate the catalog and the actual data sets.
10
10 Spreadsheet My Note: See imported and filtered in Spotfire in next slide.
11
11 My Note: First example in next Tab (in process)
12
http://toolkit.climate.gov/ 12
13
13 http://toolkit.climate.gov/help/partners
14
14 Expertise My Note: From map popups to MindTouch to spreadsheet to Spotfire.
15
15 Spreadsheet
16
http://toolkit.climate.gov/training-courses 16
17
17 Spreadsheet My Note: These can be filtered in spreadsheet and Spotfire.
18
18 My Note: Filter by Type of Training and/or Difficulty Scale.
19
Climate Explorer—Visualizing Climate Data in Maps and Graphs Climate Explorer is a research application built to support the U.S. Climate Resilience Toolkit. The tool offers interactive visualizations for exploring maps and data related to the toolkit's Taking Action case studies. Map layers in the tool represent geographic information available through climate.data.gov. Each layer's source and metadata can be accessed through its information icon. Climate Explorer graphs display 1981-2010 U.S. Climate Normals for temperature and precipitation, overlain with daily observations from the Global Historical Climatology Network-Daily (GHCN-D) database. Please note that GHCN-D data have been checked for obvious inaccuracies, but they have not been adjusted to account for the influences of historical changes in instrumentation and observing practices. GHCN-D data are useful for comparing weather and climate, but for long-term climate change analyses, we recommend the National Climatic Data Center's Climate at a Glance.climate.data.gov1981-2010 U.S. Climate NormalsGlobal Historical Climatology Network-Daily (GHCN-D)Climate at a Glance 19 Climate Explorer
20
20 http://toolkit.climate.gov/climate-explorer/ My Note: This is like Spotfire with the NTRD I just did! I can reproduce these in Spotfire.
21
21 http://toolkit.climate.gov/crt-search
22
22 http://toolkit.climate.gov/crt-search?query=*&resource=18 My Note: Find the words “datasets” but not the data! Spreadsheets and Spotfire show you the data (e.g. CSV)!
23
23
24
Conclusions and Recommendations In support of the NSF Data Science / Big Data Community and the Research Data Alliance (RDA), Semantic Community has prepared four multiple data set data sets from the RDA Climate Change Data Challenge, U.S. National Transportation Atlas Database (NTRD), Climate.data.gov, and the U.S. Climate Resilience Toolkit, to jump start the Federal Big Data Working Group Meetup, and other data science meetup participants, for our September 28 th Meetup of Data Science Meetups, to prepare for the NSF Meetup of Data Science Meetups, November 6-7, 2015. All of the information is a Data FAIRPort (Free, Accessible, Interoperable, and Reusable) in a Data Science Commons or Hub as a community service. Suggestions and feedback are welcomed. 24
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.