Semantic Search for NSF Decision Making Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

Slides:



Advertisements
Similar presentations
2015 Ontology Summit & Symposium Internet of Things: Toward Smart Networked Systems & Societies Draft 1.1 V1.11.
Advertisements

Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Federal SOA for E-Government The Top Ten Things You Need to Know for YouTube October 15, 2011 DRAFT 1
W3C eGovernment Community: Data Science Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government.
Who Tweets the most about Gov20? Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 5,
Semantic Enhancements for DoD Information Sharing, Enterprise Architecture, and Standards Dr. Brand Niemann Director and Senior Enterprise Architect –
Dynamic Case Management for Military and Intelligence Departments Can Improve Their Enterprise Architecture Programs Dr. Brand Niemann Director and Senior.
1 Services and Cloud Computing Work Groups: Status Update Brand Niemann US EPA January 8, 2010.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
A Quint: Cross Information Sharing and Integration for the Intelligence Community NCOIC-NGA Kickoff Meeting Brand Niemann Director and Senior Enterprise.
Title: Build EPA Apps in the Cloud Dr. Brand Niemann Former US EPA Senior Enterprise Architect and Data Scientist Current Binary Group Senior Enterprise.
Presentation to Data.gov PMO Semantic Web/Linked Data Team Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 27,
Build Air Force OneSource in the Cloud for the Data.Gov and Open Government Vocabulary Teams UDEF Deployment Workshop Planning Meeting at the Open Group.
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Build Systems of Systems in the Cloud: Tutorial Brand Niemann Director and Senior Data Scientist Semantic Community November 9,
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Semantic Data Discovery: Proof of Concept for DHS
Cloud: SOA, Semantics, & Data Science Welcome and Overview Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Mandates for Data Transparency in 113th Congress: DataCoalition.org Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
3 Round Stones: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
1 Semantic Cloud Computing & Open Linked Data Pattern Brand Niemann Invited Expert to the NCIOC SCOPE and Services WGs September 22, 2009.
1 Update to the Board of Research Data and Information CENDI Federal STI Managers’ Group CENDI Federal STI Managers’ Group 30 November 2010 Lisa Weber,
Big Data Conference: Analytics and Applications for Federal Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
A Spotfire Demo Gallery with Data Science Dr. Brand Niemann Director and Senior Data Scientist Semantic Community November 13, 2011 DRAFT 1.
SOA Pilots: Federation of SOA and Semantic Medline Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
XBRL Seminar: The New Data Reference Model
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NCOIC Geospatial Interoperability Task Team Presentation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
1 Wikify Your Best Content in Support of the OGD and Data.gov/semantic: Information Architecture Tutorial EPA Web Work Group, EPA Wiki and Blog Work Group,
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NCOIC-NGA Demonstration of Demonstrations Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Health Datapalooza Would Benefit From Real Innovation Investment Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
1 A Target Data Architecture for the US EPA: Implementing DRM 3.0 and Data.gov Brand Niemann Senior Enterprise Architect, US EPA April 21, 2009 PARS 2009.
Build the NITRD Dashboard in the Cloud Brand Niemann Semantic Community March 14,
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Harnessing Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
SICoP 2011: Transforming Government through Innovation with Semantic Technologies Semantic Tech and Business Conference, November 29 – December 1, 2011.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Cross Information Sharing and Integration for the Intelligence Community: 13 th SOA for eGovernment Conference Dr. Brand Niemann Director and Senior Enterprise.
1 Social Business Intelligence from Open Government Data Brand Niemann Senior Enterprise Architect US EPA November 27, 2010 DISCLAIMER: While allowed to.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
1 Promoting Careers in Knowledge Management: My Experiences Brand Niemann Library of Congress June 3, 2010.
Harnessing Health.Data.gov Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 7 EPA Jam on Improved Access to Environmental Information, June.
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 4 Interagency and Non-government (in process) Brand Niemann.
U.S. Federal Government Handling of Data for Open Government Data in Japan Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Semantic Interoperability for the Office of the National Coordinator for Health Information Technology Brand Niemann and the Health Information Technology.
Semantic Enhancements for DoD Information Sharing, Enterprise Architecture, and Standards Dr. Brand Niemann Director and Senior Enterprise Architect –
Spotfire 5 Users Guide Dashboard
Title: Build EPA Apps in the Cloud
Presentation transcript:

Semantic Search for NSF Decision Making Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community AOL Government Blogger April 4,

Overview Background NITRD Dashboards Data.gov Developer Community Research.gov Dashboard Semantic MedLine Some Next Steps 2

Background My role at EPA, as their Senior Enterprise Architect and Data Scientist, and as lead for several Federal CIO Council activities, and since leaving government to become Director and Senior enterprise Architect-Data Scientist of Semantic Community, has been to implement high-level direction as follows: 3

Background Teri Takai (DoD CIO) - Harvard Leadership for a Networked World, Lead Practitioner. I am an Invited Practitioner that Mentors Students under her direction. – Social Business Intelligence from Open Government Data Social Business Intelligence from Open Government Data Letitia Long, Director of the National Geospatial Intelligence Agency. I am the lead for the pilot demonstration for the NCOIC-NGA CRADA at the upcoming 13th SOA for eGov Conference, April 3 rd – A Quint – Cross Information Sharing and Integration for the Intelligence Community A Quint – Cross Information Sharing and Integration for the Intelligence Community – Demonstration at the 13 th SOA for E-Government Conference, April 3, 2012, at MITRE13 th SOA for E-Government Conference, April 3, 2012, at MITRE Donna Roy, Executive Director of NIEM. She requested that I provide suggestions and demonstrations for evolving NIEM which I have done twice. – A Plan for Scaling NIEM to Big Data A Plan for Scaling NIEM to Big Data – Build The NIEM Information Exchange Clearinghouse In The Cloud Build The NIEM Information Exchange Clearinghouse In The Cloud Gus Hunt, CIA CTO. He challenged me to show how to make the CIA World Fact Book more semantic and to work with Digital Reasoning. – CIA World Fact Book CIA World Fact Book – Digital Reasoning Digital Reasoning 4

Background Sonny Bhagowahlia, David McClure, and Jeanne Holm (Data.gov Program Executive, GSA Associate Administrator, and Data.gov Evangelist, respectively) challenged me to do data science for Data.gov. – Data.gov Data.gov – Data.gov Developers Community Space Launched Data.gov Developers Community Space Launched Wyatt Kash, Editor in Chief for AOL Government, challenged me to build Shared Services like Federal CIO Steven VanRoekel is asking for. – Federal IT Dashboard in Motion and In Memory Federal IT Dashboard in Motion and In Memory Dennis Wisnosky, DoD CTO, and Walt Okon, DoD Senior Architect Engineer challenged me to Build DoD in the Cloud and Federate It with Other DoD and non- DoD Architectures (e.g. TOGAF) – Build DoD in the Cloud and Build TOGAF in the Cloud Build DoD in the CloudBuild TOGAF in the Cloud – Enterprise Information Web for Semantic Interoperability at DoD Enterprise Information Web for Semantic Interoperability at DoD Dr. George Strawn, Director of the NSF NITRD and White House OSTP Staff to the CTO (Aneesh Chopra and Todd Park), challenged me to do data science dashboards. – A NITRD Dashboard (March and April 2011) A NITRD Dashboard – SIRA for Semantic Search (August 10, 2011) SIRA for Semantic Search – A Research.gov Dashboard (March 2012) A Research.gov Dashboard – Semantic MedLine (In process) Semantic MedLine 5

NITRD Dashboards 6 Note: Also see Build the NITRD Dashboard in the Cloud and Build the R&D Dashboard in the Cloud.

Data.gov Developer Community Play the role of a data scientist from an agency, use a platform that supports the things below, and build an app that provides semantic search for NSF abstracts that allows decision makers to identify future scientific research needs. My distilled suggestions for the recent excellent Data.gov meeting are: – Add a data scientist to the Data.gov team to lead a community of data scientists from the agencies and non-government organizations in a new community. – Ensure that the new data.gov platform supports the sitemap and schema protocols with well-defined URLs for content, faceted search, and big data in memory. – Encourage the new developer community to build their own data.gov sites to become both publishers and consumers of data to support the new data scientist community above. Note: Invited to give presentation the end of April by Jeanne Holm, Data.gov Evangelist. 7

Research.gov Dashboard Build an app that provides semantic search for NSF abstracts that allows decision makers to identify future scientific research needs. Created 176 MB Excel file (60,981 rows by 44 columns) for Spotfire Dashboard. – Get 2011 data from state tables? Tried to extract text for Semantic Search with SIRA and Digital Reasoning but found Abstract text is cut off and URLs are embedded in Publications and Project Outcomes columns. 8

Research.gov Spending & Results 9 Download Data Sets

Research.gov Dashboard 10

Sample of Hand Parsed Text 11 Note: We will need to get the raw text data to accomplish the objectives of this work.

Semantic MedLine Prototype: Home Semantic MEDLINE is a prototype Web application that summarizes MEDLINE citations returned by a PubMed search. Natural language processing is used to analyze salient content in titles and abstracts. This information is then presented in a graph that has links to the MEDLINE text processed. Currently, the results from 35 PubMed searches (including a variety of disorders and drugs) are available to be processed. The 500 most recent citations (from the date of the search) are available for further processing by Semantic MEDLINE. Begin at the Search tab by selecting a search; then move to the Summarize tab. Choose a summary type to specify the point of view of the summary (Treatment of Disease, Substance Interactions, Diagnosis, or Pharmacogenomics). After selecting the topic of the summary, click the Summarize and Visualize button. The graph appears below. Right click on an edge to display a MEDLINE citation. 12

Semantic MedLine Prototype: Search 13

Semantic MedLine Prototype: Summarize 14

Semantic MedLine 15

Semantic MedLine Prototype: Knowledgebase 16

Semantic MedLine: Predication Database 17 ftp://lhcftp.nlm.nih.gov/outgoing/cgsb/ Note: Large Tar and GZIP files!

Semantic MedLine: Data Extraction 18

Semantic MedLine: Analytics 19 Web Player I have questions based on these analytics.

Semantic MedLine: Analytics 20 Web Player

Semantic MedLine: Analytics 21 Web Player

Some Next Steps We will need to get the raw text data to accomplish the objectives of the work with the Research.gov Abstracts, Project Outcomes, etc. We need to extract the large Semantic MedLine Predication Databases files for Semantic Search with SIRA and Digital Reasoning. 22

AOL Government Stories Semantic Medline (Pending) HPN Health Prize for Health Data Palooza (Pending) From Catalyst to Semantic Synthesis - How the IC Finds More Needles in Bigger Haystacks (Pending) Challenges and Opportunities in Big Data: Defense Department Bets Big On Big Data Semantics and Ontologies for the Intelligence Community Working Toward Standards (Pending) Data.gov Developers Community Space Launched - Is Dr. Merkin In the House? (Pending) Building Trust Between Cloud Computing Providers and Suppliers Health Datapalooza Would Benefit From Real Innovation Investment Has NIEM Reached A Choke Point With Big Data Put Federal IT Dashboard Into Motion Why The Intelligence Community Loves Big Data Big Data Science Visualizations Past Present and Future 23

Challenges and Opportunities in Big Data 24

My Suggestions I think it leaves us with a disconnected federal big data program between the science and intelligence communities with the former considerably behind the latter. As Professor Jim Hendler, RPI Computer Scientist, commented during the meeting: "Computer scientists like us have to move to the social science side of things to really do big data.“ This new White House Initiative needs Todd Park's entrepreneurial spirit, Gus Hunt's experience, and DoD's new money, spent in a coordinated way with the IC and civilian agencies to make big data across the federal government a reality. 25

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) 26