Semantic Data Science for the US Census Bureau Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

Slides:



Advertisements
Similar presentations
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Advertisements

State Data Center and Census Information Center Steering Committee Meeting March 4-6, 2014 Digital Transformation Update U.S. Census Bureau.
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Dissemination Transformation SDC/CIC Annual Meeting March 5, 2014 Steven J. Jost, Chief Strategist, Office of the Director.
OMB Data Visualization Tool Requirements Analysis: Oracle Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Presentation to Data.gov PMO Semantic Web/Linked Data Team Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 27,
Finding, Evaluating, and Using Numeric Data [IMS 201, Statistical Literacy] [Electronic Data Center] This presentation will probably involve audience discussion,
OMB Data Visualization Tool Requirements Analysis: IBM Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
3 Round Stones: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Big Data Conference: Analytics and Applications for Federal Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
A Spotfire Demo Gallery with Data Science Dr. Brand Niemann Director and Senior Data Scientist Semantic Community November 13, 2011 DRAFT 1.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
XBRL Seminar: The New Data Reference Model
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
TECHNICAL DOCUMENTATIONPARTNERS DOWNLOAD DATA Download water quality data in MS Excel, CSV, TSV, and KML formats. Learn how to use the portal and data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
TheDataWeb & DataFerrett Rebecca Blash Bill Hazard The DataWeb Applications Branch U.S. Census Bureau.
Data Science for International Data Week 2016: Concept Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science.
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science ESIP Publication Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for USDA Big Data
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
TheDataWeb: a New Framework for Data Cavan Capps, Chief TheDataWeb Applications Branch Data Integration Division Howard Hogan, Director Demographic Programs.
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Migration Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Build the NITRD Dashboard in the Cloud Brand Niemann Semantic Community March 14,
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Harnessing Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL.
Data Science for HealthCare.gov Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
1 Social Business Intelligence from Open Government Data Brand Niemann Senior Enterprise Architect US EPA November 27, 2010 DISCLAIMER: While allowed to.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
Harnessing Health.Data.gov Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Using Open Data to Create Value for Citizens. Data.gov Provides instant access to ~400,000 datasets in easy to use formats Contributions from UN, World.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Tennessee State Data Center Murfreesboro, Tennessee November 2015 ERRAN PERSLEY.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
1 Data.gov Initiative Implementation Acceleration Discussion Architecture and Infrastructure Committee Meeting March 19, 2009 Mike Carleton and Sonny Bhagowalia.
U.S. Department of the Interior U.S. Geological Survey Manage and Provide Information: Examples from fish health, contaminants, and water quality data.
After FactFinder: The future of data dissemination at Census Bureau December 17,
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Spotfire 5 Users Guide Dashboard
Presentation transcript:

Semantic Data Science for the US Census Bureau Dr. Brand Niemann Director and Senior Data Scientist Semantic Community November 14,

Google Search Display: Census Bureau 2

Google Search Result: Census Bureau Home Page – First source for current population data and the latest Economic Indicators State and County QuickFacts – USA QuickFacts American FactFinder – Your source for population, housing Census – Redistricting Data - What is the Census? Population Estimates – The Census Bureau's Population Estimates Program Easy Stats – Easy Stats gives you quick and easy access Data Access Tools – The Census Bureau data tools provide on-line access 3

Data Access Tools Interactive Internet Data Tools: – Data Visualization Gallery - A weekly exploration of Census data used to promote visualization and make data accessible to a broader audience. – DataFerrett is a tool and data librarian that searches and retrieves data across federal, state, and local surveys, executes customized variable recoding, creates complex tabulations and business graphics. Current Population Survey, Survey of Income and Program Participation, American Community Survey, American Housing Survey, Small Area Income Poverty Estimates, Population Estimates, Economic Census Areawide Statistics, National Center for Health Statistics data, Centers for Disease Control data, and more. – DataFerrett’s newest tool, the Community Economic Development HotReport provides community and business leaders speedy access to information on counties and the Employment & Training Administration’s Workforce Innovation in Regional Economic Development (WIRED) areas across the U.S. 4

Data Visualization Gallery 5

Census Data Visualization Gallery As Data For the Digital Government Strategy 6 My Note: Structured and unstructured information is all turned into a knowledge base of data for relational and graph database processing. My Note: The entire platform can be searched. The entire knowledge base page can be searched.

Census Data Visualization Gallery: Spotfire 7 Spotfire Web Player My Note: This is federation of diverse data sources to find, facet filter, visualize, and discover new facts.

The Data Web: Data Ferrett 8

Data Ferrett Description DataFerrett is a data analysis and extraction tool to customize federal, state, and local data to suit your requirements. Using DataFerrett, you can develop an unlimited array of customized spreadsheets that are as versatile and complex as your usage demands then turn those spreadsheets into graphs and maps without any additional software. My Comment: This is what I use Spotfire for on Open Government Data for the Digital Government Strategy. 9

Community Economic Development HotReport Description This site, the Community Economic Development HotReport, provides access for users seeking economic indicators for individual counties. For areas that experience economic disruptions due to natural disasters, plant closings, base closings, and other economic changes, such as abrupt increases in employment, this HotReport shows pertinent economic indicators in unified on-line reports from many data sources. 10

Community Economic Development HotReport Web Site 11 Click on graph to view table. Community Economic Development HotReport

White House Big Data Event: Data to Knowledge to Action 12 Making the Most of Big Data “Just wanted to say how helpful it is that you take notes and share so broadly at these types of events. Thanks for your ongoing contributions to all the communities of which you are a part.”

Semantic Data Science Team Attends White House Big Data Event Our work is an example of the bold new collaboration theme: “Harnessing the Potential of Data Scientists and Big Data for Scientific Discovery” that shows “Data Innovation Across Sectors” and includes the following Breakout session topics: – Education and Workforce Development (George Mason University and John Hopkins University - see below) My Note: Census is one of 9 agencies involved in this NITRD effort. – Research and Development (NIH and YarcData) – Innovation (DC Data Science Community and Semantic Community) 13

NITRD Supplement to the FY14 President’s Budget We have worked to support the NITRD Current and Planned Coordination Activities as follows: – Working with two of the six agencies: NSF, NIH, and trying to work with the other four: DoD, DARPA, DOE, and USGS; – Following the work in the NSF-NIH Solicitation, Core Techniques and Technologies for Advancing Big Data Science & Engineering for datasets and results that can be reused; – Helping ensure a trained workforce to capitalize on big data resources by working with GMU Data Science as part of our team and preparing a graduate course on data science using the applications and data sets mentioned above and below;graduate course – Providing examples of applications that use multiagency big datasets and core technology that is needed to turn heterogeneous data into more homogeneous, interoperable data;applications – Providing big data infrastructure development for domain science with Spotfire and the YarcData Graph Appliance; and – Attending the second National Big Data R&D Initiative event. My Note: We would like to work with Census on any or all of these! 14 Current and Planned Coordination Activities

Demos Spotfire 6: – Web Link Web Link Semantic Medline with YarcData Graph Appliance Pilot: – Wiki Wiki – YarcData Videos YarcData Videos – Schizo-7 minutes Schizo – Cancer-21 minutes Cancer 15

Contact Information Brand Niemann, Semantic Community – – – N. Fredrik Salvesen, SBK LLC Alliance Partner YarcData – – –

Some Next Steps So after about 10 years of development and the recent work of our Semantic Data Science Team, we think we have the best US Federal Government semantic knowledge base (NIH Semantic Medline) running on one of the best graph computers (YarcData) for the OSTP/NITRD Federal Big Data Senior Steering WG. Our goal is to produce the “Killer Semantic Web Application for the US Federal Government” and we still have a ways to go. Now we need to help other agencies do the same by applying semantic data science to their data and metadata to develop their semantic knowledge base for piloting on the best graph computers. The following is a pilot example to begin to develop a semantic knowledge base for US Census showing the steps for preparing legacy US Census data sources and for collecting new US Census data sources so they are stored directly in a semantic knowledge base. – A historical note: This is like when I led the E-forms For E-government Pilot for OMB and the Federal CIO Council – I selected the US Census Economic Census E-forms solution by Rick Fenestra to be the best practice for getting about 15 E-forms solutions being used by the US Federal Government to adopt a common e-Grant XML Schema so all 15 could become semantically interoperable and agencies would not have to “rip and replace” solutions. This approach could make agency semantic knowledge bases interoperable so they can be federated and we would have a “killer semantic web application” on top of “individual killer semantic web applications”! 17

Data Access Tools 18 Quick Facts American FactFinder Easy Stats My Congressional District Population Finder American Community Survey 2010 Census Economic Census Interactive Maps Data Visualizations Training & Workshops Data Tools Catalogs Publications

Census Semantic Knowledge Base US Census data is available in the following ways: – Data Access Tools: Making It Easier to Use the Data Than Just Direct File Access Below (Start Here) – Research Data Centers: Access to Confidential Data (Defer This Until Later Stage) – Software to Download: More Tools to Use (This is More About Data Than Software) – Direct File Access: Public (Include This) and Private (Not Applicable Here) – Access Tools at Other Sites: Is There a Better Place to Build This Semantic Knowledge Base? (That University of Minnesota Web Site Looks Pretty Good!) 19 My Note: This defines how to start and the scope of the semantic knowledge base.

Semantic Knowledge Base Initially we need at least a taxonomy and a vocabulary. Eventually, we would like an ontology and thesaurus. We need to build a data and metadata ecosystem with relational and graph data sets. The pilot will build a knowledge base in MindTouch, spreadsheets in Excel, a dashboard in Spotfire, and a business process for data collection in Be Informed. The pilot will be scaled up to create a RDF triple store for the YARCData Graph Appliance. In essence, I am going to build a “SemanticData.gov” type application for the US Census Data. 20

Data Access Tools Data Visualization Gallery: Recall Slide 6 Knowledge Base and Slide 7 Spotfire 2010 Census Interactive Population Map The American FactFinder QuickFacts Easy Stats County Business & Demographics Map Economic Database Search and Trend Charts Glossary: See Slide 26 Excel and Slide 29 Spotfire Knowledge Bases Censtats Online Mapping Tools US Gazetteer Business Dynamics Statistics DataFerrett: Recall Slides 8-9 Community Economic Development HotReport: Recall Slides QWI Online OnTheMap Industry Focus Census 2000 EEO Data Tool 21 My Note: This is another taxonomy!

Data Access Tools: Knowledge Base Spreadsheet 22 My Note: This is a taxonomy in Semantic Web Linked Open Data Format.

Direct File Access: Public 23 My Note: This is a taxonomy of how Census organizes it data files that needs to be a searchable index in a spreadsheet.

Direct File Access Public: Knowledge Base Spreadsheet 24 My Note: This is both relational and graph (subject, object, & predicate database formats.

Census Taxonomy and Vocabulary: MindTouch Matrix 25 My Note: The entire page & platform can be searched.

Census Semantic Knowledge Base: Excel Glossary 26 My Note: All of these spreadsheets can be searched. My Note: The Semantic Community approach is consistent with the EU ISA Recommended URI Design and Management Principles.

Census Semantic Knowledge Base: Spotfire Glossary 27

Census Semantic Knowledge Base: Spotfire Taxonomy 28

Conclusions and Recommendations A taxonomy (Interactive Internet Data Tools) and vocabulary (Glossary) from Census were used to pilot a semantic knowledge base. Agile development of the semantic knowledge base was possible when the data dictionary and data are readily available in a spreadsheet or at the download site so one can focus on doing the data science and analytics. The Census "Building Deep Links into American FactFinder" can be Semantic Web Linked Open Data. – See 2012 Statistical Abstract as a Semantic Knowledge Base in the Next Slide. The Semantic Community Platform can produce a Census data science ecosystem and products in an interoperability interface with semantic interoperability. Next is piloting Be Informed for Census survey data collection and then YARCData on the triple stores that are created. 29

Statistical Abstract 2012: Spotfire Knowledge Base 30