Data Science for Data Act Data Harmonization Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.

Slides:



Advertisements
Similar presentations
PRESENTATION TO THE JOINT RULES COMMITTEE 15 MARCH 2012 ON THE PROGRESS ON IMPLEMENTATION OF THE INDEPENDENT ASSESSMENT PANEL ADOPTED RECOMMENDATIONS ADOPTED.
Advertisements

Building Effective Relationships for Improved Service Delivery Dr Ernest Surrur Secretary to the Cabinet and Head of the Civil Service May
Open Data Industry Perspective September 26, 2014 Data Transparency Town Hall.
The DATA Act Legislative Branch Implications. “ “The DATA Act is about to shake up federal operations.” --- Joseph Marks, NextGov, 4/28/14.
Federal SOA for E-Government The Top Ten Things You Need to Know for YouTube October 15, 2011 DRAFT 1
Virtual Public Policy Days presented by NAWBO’s Public Policy Forum © Deborah Wilder, Law Office of Deborah Wilder, 1200 Sixth Ave, Suite 200, Belmont,
Data Act at US Department of Treasury Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Future of Financial Management Transparency and Intelligent DataTM
Introductions - Prepare your answers while we wait for everyone to join us. I am ________________________________(Your name) I teach _________________________________.
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Data Transparency Town Hall September 26, 2014 Christina Ho Acting Deputy Assistant Secretary Accounting Policy U.S. Department of the Treasury Karen F.
Grants & Acquisition Data Elements An Exercise in Standardization Presentation at the DATA Act Town Hall September 2014.
THE ADVANCED TECHNOLOGY ENVIRONMENTAL AND ENERGY CENTER (ATEEC) Summative External Evaluation July 1, 2013 – June 30, 2014 PRELIMINARY OUTLINE.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Big Data Innovation: Semantic Analytics 14 th SOA for eGovernment Conference Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Cloud: SOA, Semantics, & Data Science Welcome and Overview Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
- The Event Intelligence Platform Smarter Events for Exhibitors, Organizers & Attendees Making the most out of Zerista Company Confidential – Do Not Reproduce.
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Update and Implications of the DATA Act – Beyond the Beltway JUNE 3, 2015.
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
MyFloridaMarketPlace Roundtable January 21, :00 a.m. – 12:00 p.m. MyFloridaMarketPlace.
Data Science for International Data Week 2016: Concept Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science.
Before you begin. For additional assistance, contact your club’s Information Technology Chairperson or Electronic Learning at:
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Open Data Future for Grants ANN EBBERTS, CEO AGA MIKE PECKHAM, DIRECTOR, DATA ACT PMO, HHS CHRIS ZELEZNIK, ENGAGEMENT LEAD, DATA ACT PMO, HHS.
U.S. Department of Agriculture eGovernment Program July 10, 2002 eGovernment Working Group Meeting Chris Niedermayer, USDA eGovernment Executive.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Data Science for HealthData.gov Developers & Family Caregivers Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Presidential Memorandum on Managing Government Records Paul Wester Chief Records Officer for the U.S. Government National Archives and Records Administration.
CEBP Learning Institute Fall 2009 Evaluation Report A collaborative Partnership between Indiana Department of Corrections & Indiana University November.
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Transparency in Government From the White House To the State House NASACT Annual Conference 2015 Concurrent Session #14 August 25, 2015.
CA COUNTY PEER QUALITY CASE REVIEW (Insert Review Week Dates)
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
1 Social Business Intelligence from Open Government Data Brand Niemann Senior Enterprise Architect US EPA November 27, 2010 DISCLAIMER: While allowed to.
25 th Annual MIS Conference Presenters: Matthew Case, U.S. Department of Education Nancy J. Smith, DataSmith Solutions Ross Lemke, AEM Corporation The.
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 7 EPA Jam on Improved Access to Environmental Information, June.
Government Technology & Innovation Incubator for Big Data Analytics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
The Integrated Coastal and Oceans Observation System Act of 2009: Implementation 2009 IOOS Regional Coordination Workshop Thursday, August 27.
INTOSAI's Capacity Building Committee Annual Meeting High-level Update on the ‎INTOSAI's Strategic Planning Process By: H.E Mr. Osama Faquih Stockholm.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
C-DERL is an application designed to be a Federal- wide, online repository for data standards, definitions, and context. It was authorized jointly by the.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
EHealth Initiative Business and Clinical Motivator Work Group January 21, :00 p.m. EDT.
Proposal for AFCEA Support to the Smithsonian Air and Space Museum Udvar Hazy Center “Become A Pilot” Family Day Saturday, June 15, 2012 March 2013.
PP 620: Public Policy and Health Administration Unit One Seminar Kris R. Foote, J.D., M.P.A., M.S.W. Kaplan University.
Better Data, Better Decisions, Better Government: Digital Accountability and Transparency Act (DATA Act) Implementation Update Christina Ho, Deputy Assistant.
Common Data Element Repository (CDER) Library The CDER Library is shaped by federal and recipient community feedback; your input will ensure the tool is.
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The Workforce Innovation and Opportunity Act
Shadi Eskaf Senior Project Director
First Meetup: Data Science for the Data Act at Treasury
Updates on U.S. Spending Transparency Improvements
OFFICE OF ADVOCACY Created by Congress in 1976.
Live Event resources Pre- event checklist Planning template
The Workforce Innovation and Opportunity Act
Presentation transcript:

Data Science for Data Act Data Harmonization Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for the DataAct Datathon August 7,

First Meetup: Data Science for the Data Act at Treasury, December 15, 2014, CGI Federal DATA Act Requirements Thoughts and Open Discussion, Art Nicewick, Executive Consultant, CGI Federal SlidesSlides Data Science for the Data Act at Treasury, Brand Niemann SlidesSlides Web Sites: Questions: At this time, we are asking for comments in response to the following questions: Which data elements are most crucial to your current reporting and/or analysis? In setting standards, what are industry standards the Treasury and OMB should be considering? What are some of the considerations that Treasury and OMB should take into account when establishing data standards? 2

Congressional Testimony Try as it might, the federal government doesn't have the best track record on publicly reporting spending data, Gene Dodaro, comptroller general of the Government Accountability Office, told lawmakers December 3, USASpending.gov's success thus far could serve as a cautionary tale for the implementation of the Digital Accountability and Transparency Act, or DATA Act, said Dodaro during a hearing of the House Oversight and Government Reform Committee. "Our recent report on USASpending.gov really illustrates the challenge, here," said Dodaro. GAO's recent report found, five years after USASpending.gov launched, much of the information remains incomplete and inaccurate – with 324 programs not recorded in the database, $619 billion omitted and many of the data elements required for reporting missing, said Dodaro. Source: FierceGovFierceGov 3

Blogs In a speech at the 2014 Financial Stability Conference last week in Washington, the Director of the Office of Financial Research at Treasury, Dick Berner, called for universal adoption of Legal Entity Identifiers (LEI) throughout the federal government. Source: Treasury.gov Web SiteTreasury.gov Web Site OMB’s Mark Reger compared the DATA Act to the Full Employment Act, noting, “there is a ton of work to be done.” Reger said that the input from data transparency consultants, contractors, and data specialists is needed to tell the implementing federal executives what data is most important and help with analysis. Source: Data Coalition BlogData Coalition Blog 4

Data Transparency Breakfasts December 8, 2014: Federal Financial Management and the DATA Act The fourth Data Transparency Breakfast, presented by PwC, will explore the transformation of the U.S. government's spending information from disconnected documents into standardized data, as required by the DATA Act of 2014, from the perspective of federal financial managers. Join the financial officers who will be responsible for applying government-wide DATA Act data standards to make federal financial reports fully searchable, interoperable, and open to all. Our panel will explore the challenges and opportunities of the DATA Act transformation. My Note: I attended the Data Transparency Breakfast this morning in preparation for our December 15th Meetup. Please see additions to the agenda above, especially the slides, Web Site Links and Questions we will be discussing to provide feedback to Mark Reger, Deputy Controller, OMB, at his request to me at the breakfast. Source: Data CoalitionData Coalition 5

Government Technology & Innovation Incubator for Big Data Analytics II Meetup, March 25 th, Eastern Foundry 6:30 p.m. Welcome and Introduction (Preview of Proposed DATA Act Elements, Standardized Formulas, and Agency Implementation Challenges) 6:45 p.m. Brief Member Introductions 7:00 p.m. Chris Garner, Paxata, Inc., Presentation and Demo SlidesPaxataDemoSlides 7:20 p.m. Steve Hanmer, Gov PATH Solution, Presentation and DemoGov PATH Solution 7:40 p.m. Open Discussion 8:00 p.m.​ Government Technology & Innovation Incubator: Eastern Foundry Tour, Geoff OrazemEastern Foundry 8:30 p.m. Networking 9:00 p.m. Depart 6

Newly Appointed U.S. CIO Tony Scott Speaks U.S. Chief Information Officer Tony Scott, in his first day of public appearances after his appointment by President Obama last month, described the President's 2013 Open Data Policy. Though the Open Data Policy is not mandatory for independent regulatory agencies, including most financial regulators, Scott said financial regulators can bring benefits to investors, their own operations, and the financial industry by voluntarily following it. View slideshow presentation here: content/uploads/2015/03/Open-data-and-financial-regulationv2.pdfhttp:// content/uploads/2015/03/Open-data-and-financial-regulationv2.pdf 7

Financial Regulation Summit Highlights Over 300 public and private sector open data leaders gathered at Union Market in Washington, D.C. on Tuesday for the Coalition's Financial Regulation Summit - aimed at building a consensus for the transformation of U.S. financial regulatory reporting from disconnected documents into open, standardized data. Participants included Members of Congress; U.S. Chief Information Officer Tony Scott; Treasury Office of Financial Research Director Dick Berner; and representatives of nearly every major financial regulator. The Financial Regulation Summit was presented by RR Donnelley, with additional sponsorship by Workiva, Booz Allen Hamilton, PwC, RDG Filings, and Socrata. In coming weeks, the Coalition will publish video of all Summit presentations and a full analysis of the MADOFF Transparency Act. 8

Parties Interested in the DATA Act 1 You are invited to participate in a webinar hosted by the DATA Act Section 5 Pilot Team to discuss the Digital Accountability and Transparency Act (DATA Act) Section 5 Pilot. This online event is being held on April 1, 2015 from 1:00PM to 2:00PM EDT. The Chief Acquisition Officers Council, General Services Administration, and the Department of Health and Human Services are sponsoring a dialogue and pilot to identify clear recommendations for (1) standardizing grant and contractor awardee reporting, (2) eliminating duplicative and/or unnecessary reporting, and (3) reducing awardee compliance costs. The open dialogue, which will launch in spring of 2015, is iterative and will first ask interested parties to weigh in on these ideas, then we will apply those ideas in a pilot, and finally we will ask participants to again weigh in on the next iteration of ideas. Participation in the dialogue will provide federal contract and grant recipient organizations a unique opportunity to guide the future of the government-wide implementation of the DATA Act. 9

Parties Interested in the DATA Act 2 Attendees will learn the background and goals of the DATA Act Section 5 Pilot, expected outcomes, and participant opportunities and requirements. The event also will address commonly asked questions about the pilot. DATA Act Section 5 Pilot Grants Lead Lora Kutkat and DATA Act Program Management Office Communications Lead Christopher Zeleznik will be leading the discussion, which will include ample time for questions and answers. A recording and documentation from the event will be posted to the Outreach section of following the event. Please send any questions regarding the DATA Act Session 5 Pilot Webinar to Emily Gartland at 10

National Webcast on Implementation of the Data Act On March 27th at 3:30pm EDT, please join us for a national webcast about implementation of the Digital Accountability and Transparency Act (DATA Act). Sponsored by a number of national organizations representing a broad-cross section of DATA Act stakeholders, the webcast will feature Federal leaders responsible for the Act's implementation. Hear from OMB Controller Dave Mader and Treasury Fiscal Assistant Secretary David Lebryk about plans for implementing this important legislation, which will have an impact on Federal agencies and all those who receive Federal funds. In particular, learn about the Federal government's approach to setting the required data element definition standards. There is no cost for participating in the webcast. 11 Source: PostponedPostponed See: GitHub Site for National DialogueGitHub Site for National Dialogue

12

Art Nicewick, Executive Consultant, CGI Federal I have been talking with Mike Wood about pulling something together for the Data Act demo day in June. I have some ideas, but no time. I'm still unclear on the goals of the Act. From what I see, it’s five headed monster, with many goals, and many of which are divergent. Everybody has a lot of ideas on what it can be, all the ideas are good. However, partitioning the problem into actionable components, defining the cost benefits of the components, and then setting the priorities --- is a challenge. I'd love to hear your thoughts. Art, Thanks and hopefully we could discuss this at the Meetup on Wednesday. 13

14

So Many Activities About Financial Data, But Not with Financial Data! But See: Data Science for Financial Data by Dr. Brand Niemann Published by AOL Government in : Recovery.gov: A Good Start But Show Us All the Missing Data, By Brand Niemann, on September 08, 2011 at 3:00 PM But See: Semantic Community showed A USASpending.gov Dashboard with All the USA Spending Data in A USASpending.gov Dashboard, December 18, But See: Semantic Community showed for the 2014 Data Transparency Summit that the Federal Digital Government Strategy accomplishes the Data Act. Hudson Hollister, Executive Director, Data Transparency Coalition, agreed. Data Science for Financial Data Transparency (with Ontologies) But See: Data Science for the Data Act at Treasury Data Act at US Department of Treasury 15

16

17

18 Data Science uses the Data Mining Ontology (suggest by Dr. Barry Smith) and Data Mining Standard Process (CRISP-DM) to structure the content into a knowledge base using semantic web standards for big data.

19

Data Science for the Data Act at Treasury My Questions For the Fourth Data Transparency Breakfast Panel: My EPA Experience: Why not have a Federal Chief Data Officer and Agency Chief Data Officers with Data Scientists Mining Agency Data Assets? Federal Spending Data Elements: Will they support more than just reporting? Data analysis and even predictive analytics? Some results highlights are: There are 59 data elements in the Data Act and 46 in the USASpending Data Dictionary. The USASpending data set with 149,110 rows and 46 columns was geocoded by Spotfire using the PlaceofPerformanceCity column. There were other columns like Congressional District, ZIP Code, and County that were available. 20

Data Science for the DataAct Datathon Finally a Data Act Activity with Actual Financial Data Where a Data Scientist Can Actually Get Ready Access to the Data! Just by happenstance, I discovered the DATA Act Forum Datathon Call for Participants, DATA Act Forum-The Art of the Possible, and the DATA Act Forum Data Zoo Technology Showcase Application on July 27-28, and July 29, respectively.DATA Act Forum Datathon Call for ParticipantsDATA Act Forum-The Art of the PossibleDATA Act Forum Data Zoo Technology Showcase Application The three events (July 27-29) will be summarized for our future meetup (Data Science for the Data Act at Treasury?) and this Data Science for the Data Act Datathon will be extended by our Data Act Data Science team to make recommendations to OMB and other agencies.Data Science for the Data Act at Treasury The next step is to render the data dictionaries and the OMB Standard Data Act Data Elements in spreadsheet form so we can begin the semantic harmonization and mediation process in Spotfire. 21

My Conclusions and Recommendations The Federal Big Data Working Group Meetup Data Mining – Data Science Process was Applied to the DataAct Datathon Data Sets. A Data Ecosytem was Built by Downloading 19 Files from the IAC/ACT Datathon Socrata Catalog and Using Spotfire to Inventory Their Characteristics in an Excel Spreadsheet. There are many duplicate files in the IAC/ACT Datathon Socrata Catalog. The 14 unique files were imported into 3 Spotfire files for analytics and visualizations. Screen Capture Samples Are Shown to Help the Datathon Participants and in Preparation for Another Federal Big Data Working Group/Virginia Big Data Meetup on the Data Act. 22

23

My Suggested Harmonization Process 1 What I am suggesting, which is the opposite of say you have an Access or MySQL database with multiple tables and key fields to join them, and you issue a SQL command to extract the subset of joined table data set you want to analyze. We have the reverse problem of trying to make 20 or so Datathon data sets, and ultimately multiple tables for every agency with their financial data, into a integrate data base to do the same thing with queries as above. I showed this in a recent Meetup for multiple Harmful Algal Bloom data sets that had been purposely designed with key fields. 24

My Suggested Harmonization Process 2 But what if the data sets have not been purposely designed with well- defined key fields or it is very difficult to match the “key fields” because of lack of data dictionaries, slightly different wording, etc. What I call semantic interoperability problems. Well I, or a team, can do this by hand using data dictionaries and the data sets in Spotfire and/or get a tool like TAMR that we had demonstrated recently in a Meetup. First you match as many of the data elements to the new OMB standard data elements (57), as I recall from work in our earlier Data Act for Treasury Meetup, and then you implement those matches in Spotfire Tools, Data Relationships feature so you can the “query” (without any SQL) a new merged, semantically harmonized table or tables. 25

26