Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

Slides:



Advertisements
Similar presentations
Data Science for Natural Medicines: Dead Doctors Don't Lie Radio
Advertisements

Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Tackling the Challenges of Big Data
Director and Senior Data Scientist/Data Journalist
OMB Data Visualization Tool Requirements Analysis: Oracle Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
A Search for Veterans Benefits Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community December 22,
Data Science for MyFamilySearch.org Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community My Personal Family History.
EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
My FamilySearch.org Tutorial Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community My Personal Family History Dashboard.
SEO for Trends to stay on Top Of. The Internet is a huge factor in how marketing is performed today, and keeping up with the latest SEO trends.
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Creating Your Electronic Catalog. What is a Catalog? eVA is Virginia’s online, electronic procurement system. This web-based vendor registration and purchasing.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
EPA Big Data Analytics: Data Science for EPA Fracturing Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
1 Semanticommunity.info Tutorial Brand Niemann December 7, 2010.
Government Printing Office The mission of GPO is to produce, preserve, and distribute the official publications and information products of the Federal.
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science ESIP Publication Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for USDA Big Data
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Creating Your Electronic Catalog. What is a Catalog? A list of products and/or services and their attributes published in the eMall, the shopping area.
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GPO’s Federal Digital System August 17, 2010 U.S. Government Printing Office.
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
0 GPO’s Federal Digital System October 21, 2008 Selene Dalecky – Lisa LaPlant – Blake Edwards U.S. Government Printing Office.
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for HealthCare.gov Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Data Science for FDA RFI Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
OMICS International welcomes submissions that are original and technically so as to serve both the developing world and developed countries in the best.
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
Data Science for EarthCube 2015 Key Documents Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
© 1990—2006 Visual Knowledge Software® | Private and Confidential | 2 Semantic Agent Wikis For Engineering.
13 Trends That Will Drive SEO in 2016 Presented By, Chennaiseocompany
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data mining in web applications
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Spotfire 5 Users Guide Dashboard
Anatomy of a Search Search The Index:
Bulk Data Task Force Update Government Publishing Office
Presentation transcript:

Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community December 13,

Overview Master Data Repository Tool RFI: Due December 30 th GPO's FDsys: 1 Billion Items Served and Ready For More Big Data: Google Page Rank Data Science: Questions, Data Mining, Invert Bath Tub, and Digital Government Strategy DTIC Site Map, Thesaurus, and Subject Categories Knowledge Base: MindTouch Knowledge Base Spreadsheet Linked Data Index: Excel Analytics and Visualization: Spotfire Semantic Search: Semantic Insights Results: Conclusions and Next Steps 2

Master Data Repository Tool RFI: Due December 30 th DTIC’s goal is to consolidate, unify, manage, control, search, analyze, and disseminate scientific and technical data using a single tool. If they really want one tool then the GPO FEDSYS is probably the closest I have worked on in the past. DTIC Needs Data Scientists to Build a Data Ecosystem with Data Science. Map RFI Requirements to Digital Government Strategy to Show How This Data Science Pilot Meets and Exceeds Those Requirements. 3

GPO's FDsys: 1 Billion Items Served and Ready For More But, “We are not resting on our laurels,” said GPO’s Chief Technology Officer Ric Davis. A major refresh of FDsys now is in the planning stages, which will include an updated search engine and improved support for mobile devices. FDsys uses Extensible Markup Language (XML) and an ISO standard format for archival information to enable searching across multiple collections, a feature not available in the original GPO website, GPO Access. “It was really a flat store of files with a search engine on top of it,” LaPlant said of the old Access. “We needed something to better manage and preserve.” The agency is evaluating cloud-based technology for FDsys as part of its upcoming major refresh, along with a new an open source search engine, Solr, which promises fault tolerant performance on a large scale. In 2012 GPO began replacing its 30-year-old composition engine called Microcomp with XML Professional Publisher (XPP) to enable the direct XML formatting of documents for both electronic and print publication. This eliminated the step of transforming documents for publication in XML. My Comment: So I taught XML Training at GPO, showed them how to “author once and use many” (print, CD, Web, mobile), and suggested MindTouch, the state-of- the-art Wiki with Solr (Lucene) in the Amazon Cloud that I use for all of my work, so I am still ahead of them after 15 years! I use four tools: MindTouch, Spotfire, Semantic Insights, and Be Informed. 4

Big Data: Google Page Rank PageRank is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google: – PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. PageRank is now one of 200 ranking factors that Google uses to determine a page’s popularity. Google Panda is one of the other strategies Google now relies on to rank popularity of pages. Even though PageRank is no longer directly important for SEO purposes, the existence of back-links from more popular websites continues to push a webpage higher up in search rankings. My Comment: Why not create big data pages that are data in relational and graph format? 5

Data Science: Questions, Data Mining, Invert Bath Tub, and Digital Government Strategy Answer Four Questions: How is the data collected? Where is it stored? What are the results? Why should we believe them? Follow Data Mining Process: Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Invert the Activity Level Bathtub: Collection (Easy and Fast) Analysis (Maximize Time Spent) Communications (Easy and Fast) Digital Government Strategy: Unstructured is Structured Unstructured and Structured Are Integrated Well-defined URLs Content (XML, Java, and APIs with Non- Web Formats Like PDF Converted) Data Ecosystem 6

DTIC Site Map 7

DTIC Thesaurus 8

DTIC Subject Categories 9

Knowledge Base: MindTouch 10 Data Science for DTIC Data Ecosystem

Knowledge Base Spreadsheet Linked Data Index: Excel 11

Analytics and Visualization: Spotfire 12 Web Player

Semantic Search: Semantic Insights 13 My Note: I requested use of Research Assistant and Research Librarian on DTIC Content.

Results: Conclusions and Next Steps A Data Scientist Has Built a DTIC Data Ecosystem That Answers Four Basic Questions, Supports Data Mining, Inverts the Bath Tub, and Complies With the Digital Government Strategy. The DTIC Data Ecosystem Was Built From the DTIC Web Site Map and Satisfies the RFI Requirements. The DTIC Data Ecosystem Provides Sematic Search and Visualizations in MindTouch, Excel, and Spotfire. Semantic Community Has Requested the Use of Research Assistant and Research Librarian Betas from Semantic Insights For Use on DTIC Content. 14