Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic.

Similar presentations


Presentation on theme: "Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic."— Presentation transcript:

1 Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup December 1, 2014 1

2 The Profit and Data Enterprises Marcus Lemonis (born November 16, 1973) is a Lebanese-born American businessman, investor, television personality and philanthropist. He is currently the chairman and CEO of Camping World and Good Sam Enterprises, and the star of The Profit, a CNBC reality show about saving small businesses through People, Process, and Products. The Federal Big Data Working Group Meetup is also about helping government agencies develop: – People – Data Scientists – Process – Data Infrastructure – Products – Data Publications Some examples: – EPA – FDA – NOAA – HHS And provide MOOCs for training and networking. 2 http://en.wikipedia.org/wiki/Marcus_Lemonis

3 Five MOOCs for Big Applications and Analytics Practical Data Science for Data Scientists by Niemann Based on Schutt and O’Neil Book Data Science for Data Mining by Niemann Based on North Book and Borne Class Federal Big Data Working Group Meetups by Niemann and Goodier Tackling the Challenges of Big Data, MIT ProfessionalX Online Course by Niemann Based on Rus and Madden MOOC Data Science for Big Data Application and Analytics MOOC by Niemann Based on Geoffrey Fox MOOC 3 See: Top 5 MOOCs for Data ScienceTop 5 MOOCs for Data Science

4 Agenda 6:30 p.m. Welcome and Introduction – Report on Recent HHS IDEA Lab Demo Meeting with Bryan Sivak (invited) and Damon Davis (invited) and HHS Data Science Data Publication Tutorial Slides Background Data Science for Tackling the Challenges of Big Data (MIT Online Course)SlidesBackgroundData Science for Tackling the Challenges of Big Data 7:00-7:15 p.m. Joe Pringle, Director of Health, Socrata Slides and Demo Links:Slides – https://www.healthcare.gov/health-and-dental-plan-datasets-for-researchers-and-issuers/ https://www.healthcare.gov/health-and-dental-plan-datasets-for-researchers-and-issuers/ – http://www.healthpocket.com/ http://www.healthpocket.com/ – https://data.cityofchicago.org/ https://data.cityofchicago.org/ – https://github.com/Chicago https://github.com/Chicago – http://www.chicagohealthatlas.org/ http://www.chicagohealthatlas.org/ 7:15 p.m. Brief Member Introductions and Refreshment Break 7:30 p.m.​ Alex Sherman and Kartik Verma, Deloitte Consulting for HHS NIH and MHS, Slides and Demo Link: http://semoss.org/Slideshttp://semoss.org/ – GINAS: Advancing FDA's Ingredient Information System, Noel Southall, National Institutes of Health (also FDA involved) (invited) FDA has articulated its vision for a next-generation data system that serves as the central clearing house for ingredients in medical products. Meanwhile, the National Center for Advancing Translational Science at NIH has created its own substance tracking system to facilitate research efforts. Working with the FDA, this NIH team will test their software as a solution in the FDA environment. – Fostering Scientific Insight through Data Federation, Brock Smith, National Institutes of Health (invited) This cross-departmental team consisting of individuals representing NIH, FDA and CDC recognizes a problem affecting scientists and their research goals. Because of the breadth and variety of resources, NIH researchers have difficulty synthesizing existing public data with their internally produced research findings and thus can easily lose valuable scientific insight. The team is testing the value of a web platform called SEMOSS that is designed to aggregate existing, fragmented health data while leveraging data analytic and visualization tools to enable scientists’ intuitive analysis and synthesis in their research. 8:30 p.m. Open Discussion 8:45 p.m. Networking 9:00 p.m. Depart 4 Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab

5 Calendar November 4, 2014 - December 16, 2014, Tackling the Challenges of Big Data, MITProfessionalX Online Course, $545. (January 5 th Meetup) – https://mitprofessionalx.edx.org/courses/MITProfessionalX/6.BDX/2T2014/ab out https://mitprofessionalx.edx.org/courses/MITProfessionalX/6.BDX/2T2014/ab out November 4, 2014 - "Diverse Data Analytics Applications“ a joint George Mason University and IBM ASC Symposium. (This Meetup) – Slides Slides November 13, 2014 - Automated Data Science: Data Science as a Service™ (DSaaS), Virginia Big Data Meetup. (This Meetup) – http://www.meetup.com/Federal-Big-Data-Working- Group/events/217549932/ http://www.meetup.com/Federal-Big-Data-Working- Group/events/217549932/ November 18-19, 2014 - Symposium on Predictive Analytics For Defense and Government. (December 15 th Meetup) – http://www.meetup.com/Federal-Big-Data-Working- Group/events/210087932/ http://www.meetup.com/Federal-Big-Data-Working- Group/events/210087932/ January 5, 2015 - Data Science for NSF Polar Cyberinfrastructure and MIT Big Data Course – http://www.meetup.com/Federal-Big-Data-Working- Group/events/217631412/ http://www.meetup.com/Federal-Big-Data-Working- Group/events/217631412/ 5

6 Data Science for NOAA Chief Data Officer and Big Data Predictive Analytics Meetup Excellent presentation. Predictive Analytics in the Era of Big Data by Dave Vennergrund was precise and informative. Thanks for presenting. Great presentation on NOAA Big Data by Brand! Thanks!! Has anyone come across a list or database of all US businesses (free)? Very good quality talks. Hi, This is Ram, I am Java developer with 5 years experience. I want to get in Big data development projects what is best path to start for me. Sorry if this question is already asked earlier. Great video/sound app! The presentations were excellent as always. 6 http://www.meetup.com/Federal-Big-Data-Working-Group/events/213175262/

7 Semantic Insights Followup Looking for interested individuals who wish to participate in our Natural Language Understanding and Reasoning research. We welcome educational institutions and individual researchers interested in working collaboratively with us. Accounts are available for beta test: – http://www.semanticinsights.com/ http://www.semanticinsights.com/ Applying High-speed Pattern Recognition to Generate Queryable Semantics from Big Data - Big Data is filtered and reduced in real-time for event and pattern discovery: – Applying High-speed Pattern Recognition to Generate Queryable Semantics from Big Data Applying High-speed Pattern Recognition to Generate Queryable Semantics from Big Data 7

8 Data Science for Data Mining: Overview I suggested "Data Mining for the Masses" by Matthew North. It uses the CRISP Data Mining Conceptual Model that is used in the Data Science for Business book I did the tutorial for. GMU Professor Borne uses the title in his talks and the book in his Data Science Class: – http://kirkborne.net/cds401fall2014/ http://kirkborne.net/cds401fall2014/ Available at Amazon.com: – http://www.amazon.com/Data-Mining-Masses-Matthew- North/dp/0615684378/ref=sr_1_1?ie=UTF8&qid=1346972841&sr=8- 1&keywords=data+mining+for+the+masses http://www.amazon.com/Data-Mining-Masses-Matthew- North/dp/0615684378/ref=sr_1_1?ie=UTF8&qid=1346972841&sr=8- 1&keywords=data+mining+for+the+masses Book datasets are available: – https://sites.google.com/site/dataminingforthemasses/ https://sites.google.com/site/dataminingforthemasses/ Recent book review: – http://www.onlineprogrammingbooks.com/data-mining-masses/ http://www.onlineprogrammingbooks.com/data-mining-masses/ Free PDF download of the book: – http://dl.dropbox.com/u/31779972/DataMiningForTheMasses.pdf http://dl.dropbox.com/u/31779972/DataMiningForTheMasses.pdf 8

9 Data Science for Data Mining: Tutorial I will do a tutorial on this and would welcome anyone else doing and presenting on this as well. The steps I followed are as follows: – I merged the 14 CSV files into one Excel SpreadsheetExcel Spreadsheet – I copied the Book PDF files into MindTouch by first creating the Table of Contents structure and then copying individual sections of the book to support the Exploratory Data Analysis I did with Spotfire. – Instead of the book's text mining exercises and four text files in Chapter 12, I text mined the entire publication by building a structured knowledge base in the Excel Spreadsheet.Excel Spreadsheet Question: Can we do RapidMiner with Spotfire? The Answer is Yes and is shown in the Spotfire Screen Captures below. 9

10 GMU CDS 401 Syllabus 10 http://kirkborne.net/cds401fall2014/

11 GMU CDS 401 Reading Assignments 11 http://kirkborne.net/cds401fall2014/cds401-reading.htm

12 Data Science for Data Mining: Knowledge Base for Finding 12 Data Science for Data Mining Google Chrome Find: Regression

13 Data Science for Data Mining: Spreadsheet for Finding 13 Excel Spreadsheet Also 14 CSV files merged for Spotfire

14 Data Science for Data Mining: Spotfire Screen Captures 2.1. Cover Page 2.2. Chapter 03 Data Set 2.3. Chapter 04 Data Set 2.4. Chapter 05 Data Set 2.5. Chapter 06 Data Set 2.6. Chapter 07 Data Set Scoring 2.7. Chapter 07 Data Set Training 2.8. Chapter 08 Data Set: MyModel​ 2.9. ​Chapter 09 Data Set Scoring 2.10. Chapter 09 Data Set Training 2.11. Chapter 10 Data Set Scoring 2.12. Chapter 10 Data Set Training 2.13. Chapter 11 Data Set Scoring 2.14. Chapter 11 Data Set Training 2.15. Chapter 11 Exercise Training Data 14

15 Data Science for Data Mining: Spotfire Dashboard 15 Web Player

16 2014 George Mason and IBM Symposium on Diverse Data Analytics Applications In the last decade, data explosion and robust analytics tools engender “Big Data and Analytics” among the most popular words used in the computer engineering and IT industry. In addition, the cloud, the social, and the mobile environments generate a tremendous amount of personalized, geospatial and temporal data that is extremely valuable to education, business operations, government services, and the intelligence community. According to the Forum for Innovation, 90 percent of the world's data has been produced in the last two years. The operational need and market demand grow stronger and stronger. To create an opportunity to share Big Data and Analytics knowledge and technologies among academic institutions, industry leaders and government customers, George Mason University and IBM will host the conference "Big Data and Analytics 2014" on November 4th at George Mason University. In this conference, we have invited leaders and experts from academia, industry and government to discuss how big data and analytics hold the key to unlocking value. If you are a student, you will learn topics like data mining, statistical models, predictive analytics, and data visualization; if you are a researcher, you can compare notes with analytics experts from George Mason University, IBM and other colleagues; if you are a technology provider, you will get an update on the cutting edge of analytics in industry and research. 16 Web SiteWeb Site and SlidesSlides

17 GMU Data Analytics Engineering, MS 17 http://catalog.gmu.edu/preview_program.php?catoid=25&poid=23972

18 Data Science for the HHS IDEA LAB: Knowledge Base 18 Data Science for the HHS IDEA LAB The HHS IDEA Lab is cultivating innovation for a more modern and effective government. They are striving to better harness the talent of the workforce at HHS and remove barriers HHS employees are faced with so they can act. They are doing this through a three pronged approach: Encouraging internal entrepreneurship by investing in HHS employees; Recognizing they don’t have all the answers inside government and are bringing in external talent to help; and Building communities of like-minded people across HHS to take on issues of strategic importance.

19 HHS IDEA Lab HHS IDEA Lab Hosts Demo Day for 11 Teams to Pitch Potentially Game-Changing Projects for Continued Support to HHS Senior Leadership: – Media Advisory for September 30, 2014, 10:30AM – 12:30AM What is the HHS IDEA Lab? – The approach the IDEA Lab takes is based on four tenets: Innovation is a direct result of the freedom to experiment. Design is critical to effectively communicate ideas. Entrepreneurship allows us to take advantage of underutilized talent. Action, above all else, is encouraged. Data Science for the HHS IDEA LAB and Innovative Design, Development and Linkages of Databases Fellowship: My Tribute to George Thomas (July 2014) – Still not decided 19

20 HHS IDEA Lab Hosts Demo Day "Shark Panelists", Dr. Taha Kass-Hout, FDA HHS IDEA Lab Director and CTO, Bryan Sivak (invited) HHS Health Data Initiative Director, Damon Davis, PAWG 2014 (invited) National Institutes of Health, Noel Southall GINAS: Advancing FDA's Ingredient Information System (invited) National Institutes of Health, Brock Smith, Fostering Scientific Insight through Data Federation (SEMOSS) (invited) Alex Sherman, Deloitte Consulting LLP, (Accepted) 20

21 FDA Analytics with SEMOSS 21 http://blog.semoss.org/2013/11/fda-analytics.html

22 Agenda 6:30 p.m. Welcome and Introduction – Report on Recent HHS IDEA Lab Demo Meeting with Bryan Sivak (invited) and Damon Davis (invited) and HHS Data Science Data Publication Tutorial Slides Background Data Science for Tackling the Challenges of Big Data (MIT Online Course)SlidesBackgroundData Science for Tackling the Challenges of Big Data 7:00-7:15 p.m. Joe Pringle, Director of Health, Socrata Slides and Demo Links:Slides – https://www.healthcare.gov/health-and-dental-plan-datasets-for-researchers-and-issuers/ https://www.healthcare.gov/health-and-dental-plan-datasets-for-researchers-and-issuers/ – http://www.healthpocket.com/ http://www.healthpocket.com/ – https://data.cityofchicago.org/ https://data.cityofchicago.org/ – https://github.com/Chicago https://github.com/Chicago – http://www.chicagohealthatlas.org/ http://www.chicagohealthatlas.org/ 7:15 p.m. Brief Member Introductions and Refreshment Break 7:30 p.m.​ Alex Sherman and Kartik Verma, Deloitte Consulting for HHS NIH and MHS, Slides and Demo Link: http://semoss.org/Slideshttp://semoss.org/ – GINAS: Advancing FDA's Ingredient Information System, Noel Southall, National Institutes of Health (also FDA involved) (invited) FDA has articulated its vision for a next-generation data system that serves as the central clearing house for ingredients in medical products. Meanwhile, the National Center for Advancing Translational Science at NIH has created its own substance tracking system to facilitate research efforts. Working with the FDA, this NIH team will test their software as a solution in the FDA environment. – Fostering Scientific Insight through Data Federation, Brock Smith, National Institutes of Health (invited) This cross-departmental team consisting of individuals representing NIH, FDA and CDC recognizes a problem affecting scientists and their research goals. Because of the breadth and variety of resources, NIH researchers have difficulty synthesizing existing public data with their internally produced research findings and thus can easily lose valuable scientific insight. The team is testing the value of a web platform called SEMOSS that is designed to aggregate existing, fragmented health data while leveraging data analytic and visualization tools to enable scientists’ intuitive analysis and synthesis in their research. 8:30 p.m. Open Discussion 8:45 p.m. Networking 9:00 p.m. Depart 22 Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab


Download ppt "Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic."

Similar presentations


Ads by Google