The Anatomy and Physiology of Data Science Peter Fox 1 ( 1.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Year Two Year Three Year One Research methods teaching in the social sciences: An integrated approach to inquiry- based learning.
A. John Bailer Statistics and Statistical Modeling in The First Two Years of College Math.
INTRODUCTION TO MODELING
Assessment of Undergraduate Programs Neeraj Mittal Department of Computer Science The University of Texas at Dallas.
Dr Jim Briggs Masterliness Not got an MSc myself; BA DPhil; been teaching masters students for 18 years.
Data Quality and Education Sean Fox SERC, Carleton College.
Determining CLIMASP Competencies Jerash University Development of Interdisciplinary Program on Climate Change and Sustainability Policy- CLIMASP Development.
Core Competencies Student Focus Group, Nov. 20, 2008.
Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
Educational Outcomes: The Role of Competencies and The Importance of Assessment.
Introduction to Student Learning Outcomes in the Major
David L. Spooner1 IT Education: An Interdisciplinary Approach David L. Spooner Rensselaer Polytechnic Institute.
Science Inquiry Minds-on Hands-on.
Computational Thinking Related Efforts. CS Principles – Big Ideas  Computing is a creative human activity that engenders innovation and promotes exploration.
Capstone Design Project (CDP) Civil Engineering Department First Semester 1431/1432 H 10/14/20091 King Saud University, Civil Engineering Department.
FLCC knows a lot about assessment – J will send examples
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Medical Informatics Basics
Formulating objectives, general and specific
Framework for K-12 Science Education
Teaching Data Management - An Overview Anne Marie Smith La Salle University.
Day 1 Session 2/ Programme Objectives
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
University of Nottingham School of Computer Science Large Scale Systems Design Dr Dario Landa-Silva 1 Large Scale Systems Design G52LSS Semester 1 of session.
Bringing Data Science, Xinformatics and Semantic eScience into the Graduate Curriculum (solicited) EGU (EOS 6/ ESSI2.3) April 25, 2012, Vienna.
Sheila Roberts Department of Geology Bowling Green State University.
Margaret J. Cox King’s College London
Michigan High School Science Meap Test Constructing.
Designing and implementing of the NQF Tempus Project N° TEMPUS-2008-SE-SMHES ( )
Purpose of study A high-quality computing education equips pupils to use computational thinking and creativity to understand and change the world. Computing.
EENG 1920 Chapter 1 The Engineering Design Process 1.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Charles Tappert Seidenberg School of CSIS, Pace University
Data Science and Analytics Curriculum development at Rensselaer (and the Tetherless World Constellation) (Adapted from NRC BigData Education Was April.
TEA Science Workshop #3 October 1, 2012 Kim Lott Utah State University.
Unit Presentation Ruth Arce. “ A man paints with his brains and not with his hands” -Michelangelo.
BUSINESS INFORMATICS descriptors presentation Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST) Linkoping.
Learning outcomes for BUSINESS INFORMATCIS Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST)
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 1a, January 21, 2014, SAGE 3101 Introduction to Data Analytics, Current Challenges. Course Outline.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
FET National Curriculum Statements Dramatic Arts Beyond 2006 WESTERN CAPE EDUCATION DEPARTMENT.
LIKES: Educating the Next Generation of Knowledge Society Builders Authors: Wingyan Chung, Edward A. Fox, Steven D. Sheetz, Seungwon Yang Presenter: Wingyan.
Graduate studies - Master of Pharmacy (MPharm) 1 st and 2 nd cycle integrated, 5 yrs, 10 semesters, 300 ECTS-credits 1 Integrated master's degrees qualifications.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Design of a Typical Course s c h o o l s o f e n g I n e e r I n g S. D. Rajan Professor of Civil Engineering Professor of Aerospace and Mechanical Engineering.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Preparing Future Teachers for 21 st Century Learning Partnerships that enhance the capacity of pre-service education 2008 Deakin University Faculty of.
Semantics and analytics = making the data and the decisions smarter? Digital Antiquity CI Feb 7-8, 2013, Arlington VA Peter Fox (RPI and WHOI)
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
CSE 102 Introduction to Computer Engineering What is Computer Engineering?
Session Objectives Analyze the key components and process of PBL Evaluate the potential benefits and limitations of using PBL Prepare a draft plan for.
Intel ® Teach Program International Curriculum Roundtable Programs of the Intel ® Education Initiative are funded by the Intel Foundation and Intel Corporation.
Problem-Solving Approach of Allied Health Learning Community.
Resources and Reflections: Using Data in Undergraduate Geosciences Cathy Manduca SERC Carleton College DLESE Annual Meeting 2003.
Navigating the ‘information jungle’ a Research Safari Leonie McIlvenny.
Common Core State Standards Introduction and Exploration.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
Course, Curriculum, and Laboratory Improvement (CCLI) Transforming Undergraduate Education in Science, Technology, Engineering and Mathematics PROGRAM.
Inquiry Primer Version 1.0 Part 4: Scientific Inquiry.
CDIO: Overview, Standards, and Processes (Part 2) Doris R. Brodeur, November 2005.
Introduction to Data Analytics, Current Challenges. Course Outline
Day 1 Session 2/ Programme Objectives
Bit.ly/2c3XMgd.
Computational Reasoning in High School Science and Math
The Concept of INTERDISCIPLINARY TEACHING
Informatics underlying Data Science (ists)
School of Information Management Nanjing University China
Using Data in Undergraduate Science Classrooms
Presentation transcript:

The Anatomy and Physiology of Data Science Peter Fox 1 ( 1 Rensselaer Polytechnic Institute th St., Troy, NY, United States – see Acknowledgements) Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute Acknowledgments: TWC eScience Group W3C Provenance Working Group Sponsors: Rensselaer Polytechnic Institute Tetherless World Constellation MOTIVATION  Whether the science (especially geosciences) community at-large likes it or not, the co-opting of the term Data Science by the private sector has led to increased hype over data science as a career and as a means to solve challenging data problems, and lack of educational innovation in curricula for data science.  If the full benefits of a new generation of statistical and analytical software tools that operate on high-performance computational infrastructure are to be attained, adequate attention to the 'science of data science' is needed. In this contribution, we present a science view of data science both from an education and research perspective.  We introduce a research agenda that explores the key challenges that must be met to meet the needs of research driven by large-scale data analytics.  We focus on three, as-yet untapped, data science topics:  understanding scale in systems,  sparse systems, and  abductive reasoning.  We conclude with a specific call to action to make progress on the aforementioned topics. The Landscape – Data Ecosystem and What Makes Up a Data Scientist? Learning Outcomes  Physiology (in a group)  Definition of Science Hypotheses, Guiding Questions  Finding and Integrating Datasets  Presenting Analyses and Viz.  Presenting Conclusions  Institutions to provide reliable, high- functionality data infrastructures that facilitate analytics  Provision of intermediate to advanced Statistics to undergraduates and early graduate students  Well-curted datasets are made widely available along with developed models and validation statistics  All results are under continuous scrutiny, are traceable and verifiable AGUFM14 – ED31E-3455 (MS Hall A-C)  To demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results  To demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.  To develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems  Examine real-world examples to place data-mining techniques in context, develop data-analytic thinking, to illustrate that their application is art and science.  Must effectively communicate analytic findings to non- specialists.  Must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.  Anatomy (as an individual)  Data Life Cycle – Acquisition, Curation and Preservation  Data Management and Products  Forms of Analysis, Errors and Uncertainty  Technical tools and standards Anatomy study of the structure and relationship between body parts Physiology is the study of the function of body parts and the body as a whole. 1 DataInformationKnowledge ProducersConsumers Context Presentation Organization Integration Conversation Creation Gathering Experience BigData Science (Data Analytics) Anatomy & Physiology Call To Action Learning Outcomes“Data” Science Anatomy & Physiology Call To Action  Anatomy (individual)  Intermediate Skill in parametric and non-parametric statistics  Application of a broad spectrum of Data Mining and Machine Learning Algorithms  Ability to cross-validate and optimize models  Application to specific datasets Through class lectures, practical sessions, written and oral presentation assignments and projects, students should:  Develop and demonstrate skill in Data Collection and Data Management  Demonstrate proficiency in Data/ Information Product Generation  Demonstrate science-driven Analysis and Presentation of Integrated Datasets from the Web  Demonstrate the development and application of Data Models  Convey knowledge of and apply Data and Metadata Standards and explaining Provenance  Apply Data Life-Cycle principles, construct Data Workflows  Develop and demonstrate skill in Data Tool Use and Evaluation  Data Science across the curriculum  Same as “Calculus”  And … Intro to Statistics  Data Management is Second Nature  Like operating an instrument  Openness/ sharing is the natural state  As-a-whole, the Data Scientist works collaboratively and is recognized and rewarded by peers and organizations  Data Science primarily advances the inductive conduct of science but to understand scale in systems, accommodate sparse systems, and provide for abductive reasoning, data scientists must progress to data analyticists. Data science is advancing the inductive conduct of science and is driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. It is changing the way all of these disciplines do both their individual and collaborative work. Key methodologies in application areas based on real research experience are taught to build a skill-set. Data and Information analytics extends analysis (descriptive and predictive models to obtain knowledge from data) by using insight from analyses to recommend action or to guide and communicate decision-making. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with an entire methodology. The world at-large is confronted with increasingly larger and complex sets of structured/unstructured information; from sensors, instruments, and generated by computer simulations; data is "hidden" in websites, application servers, social networks and on mobile devices. In commerce and industry, analytics-driven enterprises are becoming mainstream. Yet, there is a shortfall in the key education skills needed to meet the growing needs. Traditional enterprises are moving toward analytics-driven approaches for core business functions. In the government and corporations, cybersecurity problems are prevalent. Key topics include: advanced statistical computing theory, multivariate analysis, and application of computer science courses such as data mining and machine learning and change detection by uncovering unexpected patterns in data. Lt. Cmdr Data, Star Trek TNG Lt. Cmdr Data and Friends Overused Venn diagram of the intersection of skills needed for Data Science (Drew Conway) The Data- Information- Knowledge Ecosystem (Fox; derived)  Physiology (term project)  Definition of Science Hypotheses, with Prediction/ Prescription Goal  Cleaning and Preparing Datasets  Validating and Verifying Models  Presenting Ideas and Results