Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.

Slides:



Advertisements
Similar presentations
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Advertisements

Data Science for Tackling the Challenges of Big Data
Who Tweets the most about Gov20? Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 5,
Data Science for NSF Polar Cyberinfrastructure & MIT Big Data Course Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
EarthCube Data Science Publications Dr. Joan Aron Dr. Sophia Liu Dr. Brand Niemann May 29, 2015
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
My FamilySearch.org Tutorial Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community My Personal Family History Dashboard.
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
EPA Big Data Analytics: Data Science for EPA Fracturing Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAP Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Federal Big Data Working Group Meetup: The Yosemite Project: A Roadmap for Healthcare Information Interoperability and The New Book: Building Ontologies.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for VIVO Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for International Data Week 2016: Concept Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science.
Director and Senior Data Scientist/Data Journalist
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for EPA & USGS Fracturing & Fracking­­­­­ Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for USDA Big Data
Data Science for HealthData.gov Developers & Family Caregivers Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for the National Big Data R and D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for NSF Data Science Workshop 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science NSF.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Driven Farming: Week 6: Deployment Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Week 6 Deployment.
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
1 Social Business Intelligence from Open Government Data Brand Niemann Senior Enterprise Architect US EPA November 27, 2010 DISCLAIMER: While allowed to.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 7 EPA Jam on Improved Access to Environmental Information, June.
Government Technology & Innovation Incubator for Big Data Analytics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Defense Strategies Institute Professional Educational Forum Harnessing the Power of Big Data for The Intelligence Community November 17-18, 2015 Mary M.
Climate Change & Genomic Data - Data Science Meetup of Meetups Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for EarthCube 2015 Key Documents Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
National Data Science Organizers Lightning Talks From Around the Country Dr. Brand Niemann Founder and Co-Organizer Federal Big Data Working Group Meetup.
HealthIT.gov Dashboard: Spotfire not Flash Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science and Semantic Insights for DoD Joint Doctrine Meetup Dr. Brand Niemann Founder and Co-Organizer Federal Big Data Working Group Meetup Director.
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for UN, HDX, OSTP, RDA, etc. July 15 th : Data Science for RDA Climate Change Data Challenge and Meetup Goals: Goal 1: Digital Catalog.
USDA Big Data Science for Precision Farming With FarmLogs
Data Science for RDA Climate Change Data Challenge and Meetup
First Meetup: Data Science for the Data Act at Treasury
CS & CS Capstone Project & Software Development Project
Presentation transcript:

Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Agency Initiatives 2015 August 3,

Activities 906 Members on July 28 th and 13 New Members on July 22 nd – a Daily Record! Member Mary Galvin had John Patrick Junior this month. Dr. Tom Rindflesh, NIH/NLM Semantic Medline on August 17 th on Glucan. Data Science for EPA Hydraulic Fracturing Webinar, September 1 st. OSTP/NSF Data Science Meetup of Meetups, November 6 th, Ballston, VA. Steve Hanmer, Mission Source, co-planning Data Science for Data Act Datathon Meetup. He attended the Data Act Datathon and Forum this week and will report. Jonathan Hines, ORNL science writer, doing a story on Semantic Medline and the ORNL CADES – Compute and Data Environment for Science. Dr. David Booth, Yosemite Project (Semantic Interoperability of EHRs), Cambridge Semantic Web Meetup Founder, Accepted to Speak with Date TBD. Attended Algorithms for Geospatial Data Analysis and Data Owls Meetups. 2

Algorithms for Geospatial Data Analysis and Data Owls Meetups I am not able to help with a blog for the Wednesday Meetup because there is not enough information to write a blog. My slide 3 (that I posted to your Meetup) shows the information I need for a blog, and collect beforehand for my Meetup blogs. In this case my research since the Meetup shows both authors could have accessed and used the actual data from the EIA. An example of what I am saying is my data science blog for our Monday August 3rd Meetup. Listen to CFPB Data Manager, get Consumer Complaint Database, and see Data Science on that data set! 3

Data Mining - Data Science – Data Publication Process Data Mining Process: Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Data Science Process: Data Preparation Data Ecosystem Data Story Data Science Questions: How was the data collected? Where is the data stored? What are the data results? and Why should we believe the data results? Data Science Data Publication: Knowledge Base Spreadsheet Index Web & PDF Tables to Spreadsheet Data Browser Dynamically Linked Adjacent Visualizations 4

Data Science Data Publication: Data Browser 5

Data Science Data Publication: Dynamically Linked Adjacent Visualizations 6

USGS geochem.csv Data Problem 1 Sophia, In Brand Niemann's presentation to the Big Data group, he mentioned trouble with geographic coordinates in the file geochem.csv located at I've examined this file in Microsoft Excel 2010, plotting the latitude against longitude, and I don't see any anomalies. If there is any other information that might help to clarify the problem Brand had, I'd be happy to investigate further, but with the available evidence it looks like a software problem with the tools he was using. Peterhttp://mrdata.usgs.gov/geochem/geochem.csv My Note: I also did a scatter plot in Spotfire when the Map Tool did not work. Peter (cc Brand), Thanks for following up on this. I have included Brand so that he can reply with a more thorough response. I was also very interested to know why there was a discrepancy with the geographic coordinates. It would be helpful to know the source of the issue. Thanks, Sophia 7

USGS geochem.csv Data Problem 1 Peter, The problem is that the geochem.csv treats Latitude and Longitude as Categorical Data and not Numerical Data as does say the MRDS.csv, etc. A sophisticated program like Spotfire is sensitive to that important difference. Brand Brand, Your statement makes no sense. CSV files are plain text, with the rows specified as lines, and the columns delimited by commas. There is no type information, no category information, nothing at all to which a program reading these data can be "sensitive" other than the actual values in the field. Instead, it is the obligation of the person operating the software to understand the information in the data file, and apply that understanding in the use of software. That includes substantive knowledge of the meaning of the fields as well as the simple technical observations that one can make by examining the values contained in each field. That's why we have documentation. So first of all, when you had trouble, you should have investigated further with other software (Excel, for example), then you should have contacted me if you continued to have trouble using the data. It was irresponsible for you to claim that the problem you encountered is in the data. Peter 8

USGS geochem.csv Data Problem 2 Peter, Please download a free trial of Spotfire and import the two csv files: geochem.csv and MRDS.csv and you will see what I am talking about. I can come to the USGS and show you this if you would like. This is data science. Brand Brand, You have to understand the data, and you have to use the data responsibly. It is not up to the software to do that work for you. My suspicion is that your program treaded the coordinates differently than numbers because some of the rows have no coordinates--they're the geochemical analyses of materials standards used to ensure that the sample measurements are correct, and are used by knowledgeable specialists to assess the accuracy and precision of the data values. But you didn't look at the data, otherwise you would have seen this. That's not science of any sort. A scientist examines the evidence with which he or she works, and tries to understand what the evidence is, where it came from, and what it means. Peter Peter, I did look at multiple USGS data sets with the premier data science tool (IMHO) and reported what I found. I am telling you how you could verify my results and learn something about data science. The choice is up to you. Brand 9

Data Science Data Curation for Sustainable Data Science Meetups of Meetups I just finished four data science ecosystems: RDA Climate Data Challenge (July 15): ata_Challenge ata_Challenge RDA Information Week 2016 (Ebola Response and Nepal Earthquake) (July 17): _Data _Data USDA Microsoft Innovation Challenge (July 27): Business#Story Business#Story US Data Act (July 28): 10

Collaboration for Data Science Win-Wins USDA Open Government Data Training, Innovation Competition, and Online Course in Data-Driven Farming: _Farming_Business#Story _Farming_Business#Story Many Curated Government Data Sets and Data Science Products: Pick an Agency and/or a Data Set and Look for a Meetup on That: Mentor Startups Partnership with Eastern Foundry: Group/events/ / Group/events/ / 11

USDA Collaboration Chronology March 16th: USDA CIO and ACDO on Open Data Plan and Roundtable Meetup March 25th: Government Technology & Innovation Incubator for Big Data Analytics II Meetup at Eastern Foundry May 18th: USDA Data Science MOOC Meetup May 21 st, USDA Open Data Quarterly Submission to OMB on USDA Data Usage provided (USDA Data Science MOOC) July 21st, Data-Driven Farming Online Course Announced by HeatSpring and Semantic Community July 27th: USDA Microsoft Innovation Challenge Submission on Farm Data Dashboards July 29th, Partnerships Sought for Data-Driven Farming Online Course September 17th: Big Data Science for Precision Farming Business Online Course Meetup and Commercial Examples: Farmers Business Network, FarmLogs, etc. October 26-December 18th, Data-Driven Farming Online Course with Partners 12

13

Agenda 6:30 p.m. Welcome and Introduction (New Tutorial and Mentoring) Slides Data Science for Agency Initiatives 2015SlidesData Science for Agency Initiatives :15 p.m. Brief Member Introductions 7:30 p.m. Chad Tompkins, Section Chief, Data Section, Office of Consumer Response (suggested by (Linda F. Powell, Chief Data Officer, Consumer Financial Protection Bureau) Consumer Complaint Database Slides (not cleared for public release)Consumer Complaint Database 8:15 p.m.​ Open Discussion 8:45 p.m. Networking 9:00 p.m. Depart Listen to CFPB Data Manager, get Consumer Complaint Database, and see Data Science on that data set! 14