Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

Slides:



Advertisements
Similar presentations
Data Science for Natural Medicines: Dead Doctors Don't Lie Radio
Advertisements

Semantic Search for NSF Decision Making Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Tackling the Challenges of Big Data
W3C eGovernment Community: Data Science Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government.
OMB Data Visualization Tool Requirements Analysis: Oracle Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for NSF Polar Cyberinfrastructure & MIT Big Data Course Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
DoDAF 3.0: A Web 2.0 and SOA Mashup!
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
EarthCube Data Science Publications Dr. Joan Aron Dr. Sophia Liu Dr. Brand Niemann May 29, 2015
Data Science for Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
OMB Data Visualization Tool Requirements Analysis: IBM Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Logi Analytics Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Cloud: SOA, Semantics, & Data Science Welcome and Overview Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAP Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Federal Big Data Working Group Meetup
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Big Data Conference: Analytics and Applications for Federal Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for VIVO Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for International Data Week 2016: Concept Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science.
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for USDA Big Data
Data Science for HealthData.gov Developers & Family Caregivers Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for the National Big Data R and D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for NSF Data Science Workshop 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science NSF.
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Science for the US Census Bureau Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
Government Technology & Innovation Incubator for Big Data Analytics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Defense Strategies Institute Professional Educational Forum Harnessing the Power of Big Data for The Intelligence Community November 17-18, 2015 Mary M.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
National Data Science Organizers Lightning Talks From Around the Country Dr. Brand Niemann Founder and Co-Organizer Federal Big Data Working Group Meetup.
Welcome to Accounting I Professor R. Jason Cade Unit 1 Seminar.
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup
First Meetup: Data Science for the Data Act at Treasury
Title: Build EPA Apps in the Cloud
Scientific Data: A View from the US
Presentation transcript:

Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community March 18,

Mission Statement Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies; Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content; Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, and What are the results?); and Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House. 2 Co-organizers: Brand Niemann and Kate Goodier

Joint NSF-NIH Biomedical Big Data Research Meetup 3 “Thanks again for a wonderful gathering of deep thinkers at the NIH-NSF Big Data event -- that was terrific. Great line up of speakers.”

Scientific Data: A View from the US Dr. George Strawn, Director, NITRD/NCO and co-chair of the Federal Big Data Senior Steering Work Group: – Public access mandated for "scientific results" supported by the U.S. government – Federal agencies have submitted their "initial plans" for public access to scientific data to OSTP – Digital Object Architecture: An "hour glass" for data? (As the Internet was an hour glass for networks: TCP/IP at the narrow point; many applications above, many implementations below) – One result will be to make the scientific record into a first class scientific object 4

Activities White House OSTP - MIT Big Data Privacy Workshop:Big Data Privacy Workshop – Story and Network Analysis of Tweets: April 1 st Meetup with Kate Goodier and Marc Smith NIST Data Science Symposium: – Poster and Story: Poster and Story Data Science Team Pilot with Information Services Office White Paper for NIST and NITRD: – “Making Big Data Small" using Data Science and Semantics: “Thanks again for your effort in putting this program together.!” Information Visualization MOOC: – Story and Course Work: Forming Teams to Work with Clients for the Remaining 7 Weeks DARPA Big Mechanism: – Story and Pilot: April 15 th Meetup with Mike Megginson, Northrop Grumman, and Fredrik Salvesen, YarcData (in planning) 5

Agenda 6:30 p.m. Tutorials (Proposed GMU Course) and Refreshments – Continue Data Science Tutorial: Class 4 and Graph Databases and Bigdata SYSTAP Literature Survey of Graph Databases 7:00 p.m. Introductions and Announcements (10 seconds per individual depending on the size of the group) 7:15 p.m. Featured Presentation/Demonstration (where did you get the data, where did you store the data, and what were your results?) – Bryan Thompson, Chief Scientist of SYSTAP, LLC will speak about their SYSTAP open source graph database platform. Highlights will include support for highly available replication clusters as well their recent work with accelerated graph processing on GPUs at 3 billion traversed edges per second. – See CSHALS 2014: Tech Talk and Poster in WikiWiki 8:30 p.m. Networking/Individual Demos (talk among yourselves and look at one another's work) 9:00 p.m. Continue Your Conversations Elsewhere (We need to clear out of the space) 6

Next Meetups Sixth Meetup: April 1, 6:30 p.m. – Network Analytics and Visualization of Big Data Privacy Workshop Tweets, Dr. Marc A. Smith, Chief Social Scientist, Connected Action Consulting Group, and Remarks by the President on Review of Signals Intelligence, Dr. Kate Goodier, Information Architect, Xcelerate Solutions Seventh Meetup: April 15, 6:30 p.m. – DARPA Big Mechanism, Mike Megginson, Northrop Grumman, and Fredrik Salvesen, YarcData (in planning) Eighth Meetup: May 6, 6:30 p.m. – Federating Big Data for Big Innovation, Dr. Jeanne Holm Data.gov Evangelist Ninth Meetup: May 18, 6:30 p.m. – The Science Behind Data Science, Ruhollah Farchtchi, Director of Big Data, UNISYS 2nd Cloud, SOA, Semantics and Data Science Conference, June (in planning) 7

Overview Practical Data Science for Data Scientists: – 2/11 Specific Data Science Tools and Applications 1 – Chapters 7 & 8 Data Science for VIVO & Information Visualization MOOC (not time to cover): Data Science for VIVO & Information Visualization MOOC – 7 Weeks of Course Work with Sci2 Tools – Forming Teams to Work with Clients for Next 7 Weeks NodeXL and Sci2 for Data Science (not time to cover): NodeXL and Sci2 for Data Science – NodeXL: A free, open-source template for Microsoft® Excel® that makes it easy to explore network graphs. – Sci2: A modular tool for science of science research & practice on scholarly datasets. 8

Practical Data Science for Data Scientists 9 Class 4 Providing On-Line Class With Private Tutoring

Resources Required Textbook – Doing Data Science: Free Sampler: – (PDF) Optional Supplemental Reading: – Data Science Starter Kit: – DC Data Community: DC Data Community Calendar: – Technology Requirements – Internet and Free Tools like Spotfire Cloud: – NodeXL: My Note: Current Focus 10

Class 4 2/11 Specific Data Science Tools and Applications 1 – Discuss Reading: Chapters 7 and 8, Present and Discuss Team Homework Exercises, Hands-on Class Exercise, and Team Homework Exercise.78 – My Resources: alysis_Tools alysis_Tools Hands-on Class Exercise: – SAS and SAS Public Data Sets SASSAS Public Data Sets – See​ Spotfire ​Web Player and Spotfire File, Spotfire Web Player and Spotfire File, and Spotfire Web Player and Spotfire File​Web PlayerFileWeb PlayerFileWeb PlayerFile – Exercise: Build Your Own Recommendation System Exercise: Build Your Own Recommendation System 11

Discuss Reading Chapter 7: – How do companies extract meaning from the data they have? In this chapter we hear from two people with very different approaches to that question— namely, William Cukierski from Kaggle and David Huffaker from Google. Chapter 8: – This is the most difficult chapter in the book for me to teach since I do not understand the Python code at the end and have never built a Recommendation Engine myself. I would welcome some help here. 12

Present and Discuss Team Homework Exercise Get the Data: Go to Yahoo! Finance and download daily data from a stock that has at least eight years of data, making sure it goes from earlier to later. If you don’t know how to do it, Google it.Go to Yahoo! Finance – Yahoo: orical+Prices (CSV) orical+PricesCSV – See Spotfire ​Web Player and FileWeb PlayerFile 13

Chapter 6 Timestamps and Financial Modeling 14 Web Player

Hands-on Class Exercise SAS and SAS Public Data Sets: SASSAS Public Data Sets – SAS-Spotfire ​Web Player and Spotfire File,​Web PlayerFile – SAS Exercises-Spotfire Web Player and Spotfire File, andWeb PlayerFile – SAS Public Data Sets-Spotfire Web Player and Spotfire FileWeb PlayerFile Exercise: Build Your Own Recommendation System Exercise: Build Your Own Recommendation System – I would welcome some help here. 15

SAS Public Data Sets-Spotfire Tutorial 16 Web Player

Team Homework Exercise Read in next week's reading: Data Visualization for the Rest of Us:Data Visualization for the Rest of Us – See my Slides and Web Player.SlidesWeb Player – Start to create your own Hubway Data Visualization Challenge and eventually submit it for your class project and the challenge (now closed but still accepting submissions) if you want. Form Teams (Same or New), Ask Me Questions, and Prepare to Present Next Week 17

18 Web Player

A Data Science Big Mechanism for DARPA DARPA wants to help the DoD get to the essence of cause and effect for cancer from reading the medical literature. The Federal Big Data Working Group Meetup has also been doing that with Semantic Medline - YarcData and Euretos BRAIN (Bio Relations and Intelligence Network). – See the video for Cancer Immunotheraphy (21 minutes) which Science magazine called the biggest breakthrough in 2013 at the end of 2013 and which Dr. Tom Rindflesch (the inventor of Semantic Medline) identified from Semantic Medline as a very important breakthrough in early 2013!Cancer Immunotheraphy 19

Data Science Data Mining Process Business Understanding: – Broad Agency Announcement (PDF) and Slide Presentation (PPT) Data Understanding: – Semantic Medline, Open Catalog, CSHALS* 2014, and “Starter kit“ (to be provided) Data Preparation: – Knowledge Base of the Above Modeling: – Semantic Medline, Data Papers, and NanoPublications Evaluation: – Searchability, Discovery, and Reasoning Deployment: – Story and Knowledge Base in MindTouch, Excel, Spotfire, and Be Informed 20 * Conference on Semantics in Healthcare and Life Sciences

The Initial Knowledge Base- Data Ecosystem 21

Where did we find some structured data? 22

Where did we store the structured data? 23

Modeling: Approaches Semantic Medline – Semantic MEDLINE Query: mesothelioma and Data Science for VIVO Semantic MEDLINE Query: mesotheliomaData Science for VIVO Data Papers: – Sepublica 2014: The Semantics for e-science in an intelligent Big Data Context Nanopublications: – The smallest unit of publishable information: an assertion about anything that can be uniquely identified and attributed to its author. 24

How did we store the unstructured data? 25 Well-defined URLs Knowledge and Glossary Relational and Graph Linked Data Footnote and References Metadata and Data Sources Ready for NodeXL & Spotfire

Modeling: Examples 26 Most Recent: 500 citations, Start Date: 01/01/1900, End Date: 11/30/2013, 3169 predications extracted. Summarized for Substance Interactions Dr. Barend Mons: BRAIN Dr. Tom Rindflesch: Semantic Medline

What did we find when we analyzed the data? 27 Web Player

What is our data story and product? Data Ecosystem: – BRAIN.xlsx – DARPA.xlsx Individual Tabs: – DARPA Open Catalog: Bigdata SYSTAP is Category: Infrastructure and License: GPLv2 – DARPA Big Mechanism Knowledge Base: DARPA Big Mechanism Knowledge Base by Function (21) DARPA Big Mechanism Knowledge Base by Number of References (175) – BRAIN Knowledge Base and Examples: BRAIN Knowledge Base by Function (References) Data Fairport Conference Dropbox Files by Type (PPTX) – Data Science for VIVO & IVMOOC Citations by Publisher (APS) Total Award Amount ($) by Principal Investigator (Geoffrey Fox) 28

Graph Databases Absent: Bigdata SYSTAP Virtuoso YarcData Etc. 12 Leading BI Tools and Analytic Platforms I Tested for OMB

Bigdata SYSTAP Literature Survey of Graph Databases 30 Awarded Best Paper in 2004! And 10 Years Later…..