Data Science for EPA & USGS Fracturing & Fracking­­­­­ Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.

Slides:



Advertisements
Similar presentations
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
Advertisements

Data Science for Tackling the Challenges of Big Data
Build Air Force OneSource in the Cloud for the Data.Gov and Open Government Vocabulary Teams UDEF Deployment Workshop Planning Meeting at the Open Group.
EarthCube Data Science Publications Dr. Joan Aron Dr. Sophia Liu Dr. Brand Niemann May 29, 2015
Data Science for Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
OMB Data Visualization Tool Requirements Analysis: Logi Analytics Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: QlikView Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
EPA Big Data Analytics: Data Science for EPA Fracturing Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
1 Semantic Cloud Computing & Open Linked Data Pattern Brand Niemann Invited Expert to the NCIOC SCOPE and Services WGs September 22, 2009.
Big Data Conference: Analytics and Applications for Federal Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
State of the Federation Winter Meeting Washington, D.C. January 9, 2008.
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for International Data Week 2016: Concept Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science.
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for USDA Big Data
Data Driven Farming: Week 5: Evaluation
Data Science for HealthData.gov Developers & Family Caregivers Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Migration Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Build the NITRD Dashboard in the Cloud Brand Niemann Semantic Community March 14,
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Sensing Our Air: The Quest for Big Data About Our Air Quality and Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data.
Data Driven Farming: Week 6: Deployment Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Week 6 Deployment.
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Science for FDA RFI Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
1 Social Business Intelligence from Open Government Data Brand Niemann Senior Enterprise Architect US EPA November 27, 2010 DISCLAIMER: While allowed to.
Brought to you by 1 ICEF Online – an overview for Agents Bridging both business and social networking, the Virtual Workshop 3.0 enables.
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 7 EPA Jam on Improved Access to Environmental Information, June.
Government Technology & Innovation Incubator for Big Data Analytics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Defense Strategies Institute Professional Educational Forum Harnessing the Power of Big Data for The Intelligence Community November 17-18, 2015 Mary M.
Climate Change & Genomic Data - Data Science Meetup of Meetups Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for EPA's Chief Data Scientist: Big Data for Nutrients and Air Quality Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
ACCOUNTABILITY LEADERSHIP INSTITUTE FOR ENGLISH LEARNERS & IMMIGRANT STUDENTS Digital Chalkboard: Online Resources for English Learners and Immigrant Students.
Data Science for EarthCube 2015 Key Documents Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
National Data Science Organizers Lightning Talks From Around the Country Dr. Brand Niemann Founder and Co-Organizer Federal Big Data Working Group Meetup.
Data Science and Semantic Insights for DoD Joint Doctrine Meetup Dr. Brand Niemann Founder and Co-Organizer Federal Big Data Working Group Meetup Director.
Introduction to the Power BI Platform Presented by Ted Pattison.
Connecting people and ideas for better health. Who are NHS Networks? What is the Healthcare Professionals Commissioning Network? What are the benefits.
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
First Meetup: Data Science for the Data Act at Treasury
Presentation transcript:

Data Science for EPA & USGS Fracturing & Fracking­­­­­ Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science EPA Fracturing Data Data Science for USGS Produced Waters October 5,

Agenda Get a Preview of National Data Science Organizers Workshop on November 5-6, 2015, and the Focus on National Data Science Challenges and Hackathons 6:30 p.m. Welcome and Introduction (New Tutorial and Mentoring) Slides Data Science for USGS Produced Waters See previous: EPA Fracturing Data and TIBCO WebinarSlidesData Science for USGS Produced WatersEPA Fracturing DataTIBCO Webinar 7:15 p.m. Brief Member Introductions 7:30 p.m. Invited Presentation: Dr. Sophia Liu and USGS Staff See USGS Hydrologic Fracking and List of Hackers Dr. Sophia LiuUSGS Hydrologic FrackingList of Hackers 8:15 p.m. Open Discussion ​8:45 p.m. Networking 9:00 p.m. Depart 2

Background Dr. Sophia B. Liu is currently a Mendenhall Postdoctoral research fellow at the U.S. Geological Survey investigating crowdsourced geographic information around earthquakes. July 13 th Meetup: Data Science for USGS Minerals Big Data Slides and USGS Civic Hacking Challenges on Hackpad and Slack SlidesHackpadSlack Brief Comments from Subject Matter Experts in the USGS Energy, Minerals, and Environmental Health Programs Dr. Liu was at the EPA and White House for Crowdsourcing and Citizen Science Meetings last week and she will provide a report at our Meetup. See next slides for more background and an EPA and Crowdsourcing Citizen Science example for the EPA Nutrient Indicators Dataset by Dr. Niemann. 3

4 Federal Community of Practice on Crowdsourcing and Citizen Science Lea Shanley, Presidential Innovation Fellow, NASA, and Jay Benforado, Deputy Chief Innovation Officer, EPA. See SlidesSee Slides

5

6 Recording To Be Posted

7

8

Specific Indicators Documented Nutrient Pollution: Nutrient loads and yields Download the loadsdatatable.xlsx (2 pp, 26 K) Nutrient loads and yieldsloadsdatatable.xlsx Fertilizer Download the Fertilizer nitrogen data table (excel) (2 pp, 19 K) and Download the Fertilizer phosphorus data table (excel) (2 pp, 28 K) FertilizerFertilizer nitrogen data table (excel)Fertilizer phosphorus data table (excel) Manure Download the manuredata.xlsx (2 pp, 44 K) Manuremanuredata.xlsx Documented Impacts: Hypoxia Download the hypoxiadata.xlsx (2 pp, 13 K) Hypoxiahypoxiadata.xlsx Harmful algal toxins Download the toxinsdata.xlsx (2 pp, 14 K) Harmful algal toxinstoxinsdata.xlsx Groundwater nitrate Download the groundwaterdata.xlsx (2 pp, 15 K) Groundwater nitrategroundwaterdata.xlsx Assessed and impaired waters Download the impairedrivers.xlsx (2 pp, 17 K), Download the impairedlakes.xlsx (2 pp, 17 K), & Download the impairedestuaries.xlsx (2 pp, 15 K) Assessed and impaired watersimpairedrivers.xlsximpairedlakes.xlsximpairedestuaries.xlsx State Actions Underway: Limiting loads Download the npdesdata.xlsx (2 pp, 25 K) Limiting loadsnpdesdata.xlsx Adoption of standards My Note: Data Table Missing. Sent Message. Corrected Problem: Criteria Progress Nutrient Policy Data US EPA.xlsx Adoption of standards Criteria Progress Nutrient Policy Data US EPA.xlsx 9

10 The table on this page is missing: towards-adopting-total-nitrogen-and-total-phosphorus- numeric-water#listhttp://www2.epa.gov/nutrient-policy-data/progress- towards-adopting-total-nitrogen-and-total-phosphorus- numeric-water#list. Response: Our webmaster has addressed the issue hindering the access to full table contents for the link of your interest. I encourage you to re-visit our site.

11

12 My Note: These need to be reformatted for Spotfire and merged by state. See Next Slides. Nutrient loads and yieldsNutrient loads and yields Download the loadsdatatable.xlsx (2 pp, 26 K)loadsdatatable.xlsx

13 Spotfire great for very wide tables just like when I helped Dr. Ben Schneiderman many years ago test it with very wide Toxic Release Inventory tables! EPANutrientIndicatorsDataSet.xlsx

14 EPANutrientIndicatorsDataSet.xlsx Metadata

15 Web Player My Note: Compare these nutrient data to those from the IPNI NuGis used in my Big Data Science for Precision Farming Business Online Course.Big Data Science for Precision Farming Business Online Course Would Have to merge these two datasets by state like I am going to show next.

16 Web Player

17 Semantic Community Data Science Big Data Science for Precision Farming Business Week 4 Modeling

18 Web Player

19 Tamr Co-Founder/CTO Stonebraker Wins 2014 Turing Award

Tamr Catalog and Tamr Platform Tamr Catalog: See all your data What does Catalog do for you?: Data Discovery Data Organization Data Understanding All of your organization’s hidden data. All your data, in one place What you have and where Catalog everything Supercharge Catalog with more Tamr Tamr Platform: Focuses on solving the core problems associated with integrating many disparate datasets across the enterprise in a rapid and scalable manner. Data Unification Connect Consume Specifically, Tamr enables users to: Register any data source, regardless of source format or location Define the desired schema of the integrated dataset Cluster or merge records 20

Tamr Catalog ZIP File 21 Spreadsheets Executable and Readme

22

Tamr Catalog Views: Add Sources and Explore Table 23 Click here to add sources to your Catalog. Each tile represents a source you cataloged. Click here to explore a table of your sources.

Tamr and TIBCO Spotfire TIBCO Spotfire is both a Catalog and a Platform and an Analytics and Visualization Tool: Originally I thought that Tamr did something more that TIBCO Spotfire, but until they actually had a product to test, I could not be sure. There may still be something in the Tamr Platform that uses an ontological approach to fuzzy matching of the columns that I read about in their early white paper. I was able to integrate the 11 EPA Nutrient Indicator Datasets readily in a spreadsheet because they all have a common key field state: I could have imported each separately into Spotfire and used the Manage Relation Function to automatically merge them but I need to reformat the 11 individual spreadsheets to clean up their headers! Next I want do everything and more than the Tamr Catalog and Tamr Platform for the EPA & USGS Fracturing & Fracking­­­­­ Data: Do each individually Merge them! 24

25 Linked Data Visualizations Data Table Metadata Table Data Columns Classified by: Numbers Time Location Categories with Filters Filters Details-on-Demand Web Player

TIBCO Spotfire Data Table and Data Column Properties: EPA Nutrient Dataset 26 Data Table PropertiesData Column Properties See Next Slide For Details

27 Data Column Properties Exported to Spreadsheet

Pairs of Statistical Relationships! N(N-1)/2 where N=54 Web Player

29 Web Player EPA Fracturing Data

TIBCO Spotfire Data Table and Data Column Properties: EPA Fracturing Data 30 Data Table Properties See Next Slide For Details Data Column Properties

31 EPA Fracturing Data: additive_ingredients_final_030515_3

32 USGS Produced WatersWeb Player

TIBCO Spotfire Data Table and Data Column Properties: USGS Produced Waters 33 See Next Slide For Details Data Table PropertiesData Column Properties

34 USGS Produced Waters: USGSPWDB_v2.1

35 Add Relation for EPA Fracturing Data: additive_ingredients_final_030515_3 and USGS Produced Waters: USGSPWDB_v2.1 Step 3: This is the State Relation Step 1 Step 2

36 TIBCO Spotfire provides all the column name matches that are possible and these can become relations if the column names are semantically the same in all the data sets (left). TIBCO Spotfire can also create calculated columns (right).

37 Add USGS Produced Waters: USGSPWDB_v2.1 to EPA Fracturing Data: additive_ingredients_final_030515_3 Because EPA Fracturing has 25 datasets and USGS Produced Water is just one dataset. EPA Fracturing Data: Number of Disclosures by State USGS Produced Waters: TDS by State Web Player

October 19 th Meetup: Sensing Our Air: The Quest for Big Data About Our Air Quality Get Another Preview of National Data Science Organizers Workshop on November 5-6, 2015, and the Focus on National Data Science Challenges and Hackathons 6:30 p.m. Welcome and Introduction Slides Data Science for EPA EnviroAtlas Part II. Also see Earth Insights from Big DataSlidesData Science for EPA EnviroAtlasEarth Insights from Big Data 6:45 p.m. Invited Presentation EPA Staff: Robin Thottungal (invited) Robin Thottungal 7:15 p.m. Brief Member Introductions 7:30 p.m. Invited Presentation EPA Staff (continued):EPA Engineer Dr. Gayle HaglerEPA Engineer Dr. Gayle Hagler 8:15 p.m. Open Discussion​ 8:45 p.m. Networking 9:00 p.m. Depart 38

New EPA Chief Data Scientist Robin Thottungal (invited) will be joining as the division director for the Environmental Analysis Division (EAD) within the Office of Information Analysis and Access, and as the chief data scientist. Robin Thottungal An from EPA CIO Ann Dunkin, which Federal News Radio obtained, said Thottungal starts later this month after spending most of his career in the private sector. Most recently, Thottungal worked at Deloitte Consulting where he focused on large scale analytics projects for public sector and commercial clients. He also led the global big data community of practice for Deloitte, developing analytical frameworks and go-to market strategy for big data and analytics solutions. Additionally, Thottungal is the vice-chairman for the Institute of Electrical and Electronics Engineers (IEEE) Washington D.C. section as well as the chapter chairman for IEEE Computational and Intelligence society. 39

Rescheduled From June 29th Meet EPA Engineer Gayle Hagler, Ph.D. Q & A from EPA Presentation: Sensor Technology State of the Science July 8, Air Sensors Welcome to the Village Green Project: a research effort to discover new ways of measuring air quality and weather conditions in community environments 40