Data Science for and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community February 16,
Introduction Welcome: – Federal Big Data Working Group Meetup – Virginia Big Data Meetup – Lotico Northern Virginia Semantic Web – Other? Data Science for the National Big Data R and D Initiative, February 2, 2015: – NITRD Big Data Chronology (2012-present) and NITRD-GU Big Data Workshop: Dr. Moore agrees with IBM Watson that human curation is generally under appreciated and is the secret sauce in Big Data successes. – Wendy Wigen’s Slides: Summary of RFIs and Dr. Sudarsan Rachuri, NIST, Smart Manufacturing Systems Design and Analysis. – Calvin Andrus, CIA (Data Science: An Introduction) would "like to see more science in data science.“ – Jim Burke: Conference call-in and online slides were very useful. I appreciate the extra mile efforts, and great, informative conversations. 2
Federal Big Data Working Group Meetup Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies; Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content; Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (see Possible Team Presentations below); and Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Courses) being considered by the White House 3
The Profit and Data Enterprises Marcus Lemonis (born November 16, 1973) is a Lebanese-born American businessman, investor, television personality and philanthropist. He is currently the chairman and CEO of Camping World and Good Sam Enterprises, and the star of The Profit, a CNBC reality show about saving small businesses through People, Process, and Products. – us_Lemonis us_Lemonis The Federal Big Data Working Group Meetup is also about helping government agencies develop: – People – Data Scientists – Process – Data Infrastructure – Products – Data Publications Some examples: – EPA – FDA – NOAA – HHS – Eastern Foundry And provide MOOCs for training and networking. (Massive Open Online Courses) 4
Calendar NITRD FASTER Bigdata at NSF, February 17, 2015: – Dr. McHenry will discuss Brown Dog: A search engine for the other 99 percent (of data). Brown Dog seeks to develop a service that will make un-curated data accessible to scientists. Mission Source Consulting Launch Party, February 28: – Steven M. Hanmer, 12:00 PM to 4:00 PM, Eastern Foundry 2011 Crystal Drive, Suite 400, Data Science for Big Data Application and Analytics MOOC, March 2, th Annual Government Big Data Forum, March 12, 2015 USDA CIO and ACDO on Open Data Plan and Roundtable, March 16, 2015 Government Technology & Innovation Incubator for Big Data Analytics II, TBA. Week of March 23, Need Sponsor Data Science for Developers & Family Caregivers. April 6, 2015 The Wharton DC Alumni Innovation Summit, April 28-29, 2015 Data Science for Natural Medicines and Epigenetics (in planning), May 4,
Agenda 6:30 p.m. Welcome and Introduction (New Tutorial and Mentoring) Story, Slides for RootsTech 2015 Developer Challenge: Big Data from Everywhere for Families and Community Service, February 12–14, 2015 in Salt Lake City, UtahRootsTech :10 p.m. Brief Member Introductions 7:15 p.m. Data Science for Story, Slides, and Tutorial 7:45 p.m. National Geographic Genographic Project and Big Data, Syed Ali, Data Scientist, Analytics Led Intelligence Slides. See FamilyTree DNA and National Geography Genographic DNA test for deep ancestrySyed AliSlidesFamilyTree DNANational Geography Genographic DNA test for deep ancestry 8:30 p.m. Open Discussion 8:45 p.m. Networking 9:00 p.m. Depart 6
Overview January 13, 2015: Family Search Launches New App Gallery (more than 50 apps)App Gallery February 12–14, 2015: RootsTech 2015 Developer Challenge in Salt Lake City, Utah My Entry: Big Data from Everywhere for Families and Community Service My Partner Work: Data Science for Syed Ali’s App: National Geographic Genographic Project and Big Data You could be a partner and develop apps (e.g. A Billion Person Family Tree with MongoDB by Randall Wilson, Family Tree of Data: Provenance and Neo4, etc.) 7 “FamilySearch is a great resource, but FamilySearch alone can’t do everything. That is why we work with partners to provide complementary tools and resources and why the FamilySearch App Gallery is so important,” said Dennis Brimhall, FamilySearch CEO. “We’ve had partners for many years, and now we want to make it easier for our patrons to know about them and to find the apps they need.” 8
MyTableBox of MyFamily Tree 9
Person Template for Brand Lee Niemann 10
Mini-Tutorial: Sony Camcorder and Camtasia Video to YouTube Video How is the data collected? – Sony Camcorder and PowerPoint Slides. Where is the data stored? – Hard drive and DVD in MP4 format. What are the results? – MP4 files converted and uploaded to YouTube. Why should we believe the results? – Because I and others have done it successfully many times. 11
Data Science for Natural Medicines 12 YouTube