EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

Slides:



Advertisements
Similar presentations
Data Science for Natural Medicines: Dead Doctors Don't Lie Radio
Advertisements

Digimap Training Workshops Ordnance Survey Products Aim: to provide support staff with a basic knowledge of Ordnance Survey digital mapping products.
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Tackling the Challenges of Big Data
Director and Senior Data Scientist/Data Journalist
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 20 – 11 – 2011 College Of Computer Science and Information, Information Systems.
A Search for Veterans Benefits Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community December 22,
Data Science for MyFamilySearch.org Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community My Personal Family History.
Creating a GIS from NOAA Electronic Navigational Charts
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
EPA Big Data Analytics: Data Science for EPA Fracturing Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAP Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Batch Geocoding Online Bruce Harold
A TEDMED Data Reveal: Big and Little Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
A Spotfire Demo Gallery with Data Science Dr. Brand Niemann Director and Senior Data Scientist Semantic Community November 13, 2011 DRAFT 1.
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
GIS 1 GIS Lecture 4 Geodatabases. GIS 2 Outline Administrative Data Example Data Tables Data Joins Common Datasets Spatial Joins ArcCatalog Geodatabases.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
U.S. Environmental Protection Agency WATERS Status Update
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science ESIP Publication Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Data Science for USDA Big Data
Data Driven Farming: Week 5: Evaluation
GIS Tutorial 1 Lecture 4 Geodatabases. Outline  Data types  Geodatabases  Data table joins  Spatial joins  Field calculator  Calculate geometry.
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Migration Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Generating HTML Format Reports for Travel Demand Models May 18, 2009 Chunyu Lu Gannett Fleming, Inc.
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Build the NITRD Dashboard in the Cloud Brand Niemann Semantic Community March 14,
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
GIS 1 GIS Lecture 4 Geodatabases Copyright – Kristen S. Kurland, Carnegie Mellon University.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
Data Science for HealthCare.gov Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Data Science for FDA RFI Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Introducing ArcGIS Chapter 1. Objectives  Understand the architecture of the ArcGIS program.  Become familiar with the types of data files used in ArcGIS.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
HealthIT.gov Dashboard: Spotfire not Flash Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Big Data is a Big Deal!.
Presentation Plan 1: Topographic Mapping of Canada Objectives
What is EnviroAtlas? An online decision support tool giving users the ability to view, analyze, and download geospatial data and other resources; designed.
CyberGIS: Reston, VA, September 22, 2018
GTECH 709 GIS Data Formats GIS data formats
Spotfire 5 Users Guide Dashboard
Presentation transcript:

EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community April 17,

Overview EPA EnviroAtlas Data: Web Page Description Maps Scales – National and Community Geodatabases-to-Shape Files: FME Workbench Results Data Science Data Publication: MindTouch Knowledge Bases Spreadsheet Knowledge Base Indices and Tables Spotfire Analytics and Visualizations: Cover Page – Knowledge Base Content Analytics IRM Strategic Plan Tables EnviroAtlas Inventories Selected National Metrics 2

EPA EnviroAtlas Data: Web Page 3

EPA EnviroAtlas Data: Description EnviroAtlas national and community data are available to download below as geodatabases. Due to technical limitations which we are working to overcome, not all of the EnviroAtlas data (e.g., 1- meter landcover data, supplemental data) are available for download. As of February 2015, the EnviroAtlas is transitioning to a more recent version of the 12-digit HUCs, data aggregated to these new boundaries will be available soon. All available EnviroAtlas data for each community, except the landcover, is included in the individual geodatabase files below. Durham, NC metric tables in Esri FileGeodatabase format (compressed [36 MB]) Fresno, CA metric tables in Esri FileGeodatabase format (compressed [7 MB]) Green Bay, WI metric tables in Esri FileGeodatabase format (compressed [9 MB]) Milwaukee, WI metric tables in Esri FileGeodatabase format (compressed [31 MB]) New Bedford, MA metric tables in Esri FileGeodatabase format (compressed [4 MB]) Phoenix, AZ metric tables in Esri FileGeodatabase format (compressed [74 MB]) Pittsburgh, PA metric tables in Esri FileGeodatabase format (compressed [31 MB]) Portland, ME metric tables in Esri FileGeodatabase format (compressed [22 MB]) Tampa, FL metric tables in Esri FileGeodatabase format (compressed [53 MB]) Woodbine, IA metric tables in Esri FileGeodatabase format (compressed [658 KB]) 4

EPA EnviroAtlas Data: Maps 5

EPA EnviroAtlas Data: National Maps at the national extent provide wall-to-wall data coverage for the coterminous U.S. These data layers are summarized by 12 digit hydrologic watershed basins (12-digit HUCs) and provide approximately 90,000 similarly sized spatial units. A list of the currently available data is accessible as a.pdf, an.xls file, or as a tab-delimited text file (National file). This file shows the benefit categories under which each layer can be found. Supplemental maps for the nation provide context and additional data for exploring ecosystem services and the built environment. These data are not summarized by a specific spatial unit. Instead, these supplemental maps represent features in the landscape such as rivers and wetlands, as well as other contextual landmarks such as state boundaries. Details on each supplemental map can be found in the data fact sheets. 6

EPA EnviroAtlas Data: Community Community-level information in EnviroAtlas draws from fine scale land cover data, census data, and models to estimate ecosystem services and their benefits within the community area. EnviroAtlas community data are consistent for each available community, and are mostly summarized by census block groups. EnviroAtlas is building datasets for 50 communities in the United States; each community area boundary is based on selected block groups within the 2010 US Census Urban Area boundary. See a list of the available and upcoming communities. Learn more in the Community Fact Sheet (pp, 997K) or download a list of all the EnviroAtlas data available for each community as a.pdf), an.xls file, or as a tab-delimited text file (Community file). This file shows the benefit categories under which each layer can be found. Supplemental maps for each community provide context and additional data for exploring ecosystem services and the built environment. These data are not summarized by a specific spatial unit and include the 1 meter resolution land cover data for each community. Details on each supplemental map can be found in the data fact sheets. 7

EPA EnviroAtlas Data: Map of Communities 8

Geodatabases-to-Shape Files 9 My Note: Sort by Size My Note: 0.5 GB HUC 12 Being Updated

FME Workbench: National Metrics Log File Starting translation... FME ( Build WIN64) FME_HOME is 'C:\Program Files\FME\' FME Database Edition (node locked-crc) Serial Number: 0 Temporary License: 31 days left. Machine host name is: BrandNiemann-PC LOTS MORE DETAILS….. Total Features Written 2,607,688 Translation was SUCCESSFUL with 8 warning(s) ( feature(s) output) FME Session Duration: 6 minutes 18.3 seconds. (CPU: 326.0s user, 47.7s system) END - ProcessID: 6016, peak process memory usage: kB, current process memory usage: kB Translation was SUCCESSFUL 10

FME Workbench: National Metrics GDB-to- SHP 11

Data Science Data Publication: MindTouch Knowledge Base 12 Data Science for EPA Big Data Analytics My Note: Use Google Chrome Find

Data Science Data Publication: Spreadsheet Knowledge Base 13 EPABigDataAnalytics.xlsx

EPA EnviroAtlas National & Community Inventory 14 xlscurrentdata.xls

Data Science Data Publication: Spotfire Cover Page 15 Content Analytics Web Player

Data Science Data Publication: IRM Strategic Plan 16 Content Analytics Web Player

Data Science Data Publication: IRM Strategic Plan Tables 17 PDF to Tables Enterprise Data Dictionary Web Player

Data Science Data Publication: EnviroAtlas Inventory National 18 National Layer Counts Web Player

Data Science Data Publication: EnviroAtlas Inventory Community 19 Community Layer Counts Web Player

Data Science Data Publication: EnviroAtlas Inventory NatureServe 20 SHAPE Length Versus SHAPE Area Acres per State SHAPE Area per State Web Player

Data Science Data Publication: EnviroAtlas Inventory Land Cover 21 Percent Wetland Versus PAGP Percent Wetland by HUC 12 Web Player

Conclusions and Recommendations The EPA EnviroAtlas Data are the most integrated databases EPA has for national and community ecosystems. The use of the ESRI proprietary GDB format limits the reuse of these data in open government data applications. The Safe Software FME Workbench was used to convert GDB-to-SHP formats for selected national and community files. A Data Science Data Publication of EPA Big Data Analytics was produced as an example of the new EPA Big Data Analytics Service in the EPA 5 year IRM Strategic Plan. There are EnviroAtlas Data for 50 Communities coming and lots of EPA Geospatial Data Sets that could be used for Big Data Analytics in Data Science Data Publications. 22

Exploratory Data Science on Even Bigger Data Process: Unzipped and Converted all National Metrics GDB-to-SHP with Safe FME Workbench (70 MB to 282 MB in 102 files of which 34 were SHP). Imported all 34 SHP (30 MB) at once into one Spotfire file that was 84 MB. Did Exploratory Data Analysis on them! Geometry is missing, but did not need it for this initially because have HUC Codes. Found current HUC 12 Geometry at USDA Geospatial Data Gateway (700 MB GDB ZIP) and Unzipped to 744 MB and converted GDB-to-SHP to 4.0 GB SHP! Imported to Spotfire and only 1.8 GB file! Safe FME Workbench Log file: Total Features Written: Translation was SUCCESSFUL with 0 warning(s) ( feature(s) output) FME Session Duration: 4 minutes 12.1 seconds. (CPU: 230.1s user, 6.3s system) END - ProcessID: 10120, peak process memory usage: kB, current process memory usage: kB Translation was SUCCESSFUL 23

Spotfire Data Tables and Relations 24 My Note: 35 Data Tables with All Their Many Columns of Numbers, Locations and Categories with BioMass (83,029 Rows by 10 Columns) Joined to HUC12 (100,493 Rows by 27 Columns) All in Memory!

Exploratory Data Science: BioMass by HUC

Exploratory Data Science: BioMass by HUC

Exploratory Data Science: Florida BenMap 27

Exploratory Data Science: Florida BG_Pop 28