The Role of Statistics in Data Science, and Vice Versa

Slides:



Advertisements
Similar presentations
To find out more or to apply, please visit our career portal and post your CV. goodyear-dunlop.com/career The Opportunity Develop and apply skill to analyze.
Advertisements

Ability-Based Education at Alverno College. Proposed Outcomes for Session 1. To introduce you to Alvernos approach to designing integrative general education.
Program Goals Just Arent Enough: Strategies for Putting Learning Outcomes into Words Dr. Jill L. Lane Research Associate/Program Manager Schreyer Institute.
Intelligence Step 5 - Capacity Analysis Capacity Analysis Without capacity, the most innovative and brilliant interventions will not be implemented, wont.
Introduction to Competency-Based Residency Education
CRITICAL THINKING The Discipline The Skill The Art.
Note: Lists provided by the Conference Board of Canada
Mission Statement Benedictine Values The Business Division
Assessment of Undergraduate Programs Neeraj Mittal Department of Computer Science The University of Texas at Dallas.
Copyright © Allyn & Bacon (2010) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
Dr Jim Briggs Masterliness Not got an MSc myself; BA DPhil; been teaching masters students for 18 years.
Graduate Expectations. Critical Thinking & Life Management. IBT graduates are expected to: identify and demonstrate the essential employability skills.
The Purpose of Action Research
© 2005 Illinois Mathematics and Science Academy 1 Illinois Learning Standards Information from
SUNY Cortland Conceptual Framework … our shared vision for preparing candidates to work in P-12 schools.
Meeting SB 290 District Evaluation Requirements
Day 1 Session 2/ Programme Objectives
T 8.0 Central concepts:  The standards for all lessons should not be treated simply as a “to do” list.  Integration of the disciplines can be effective.
Foundations of Educating Healthcare Providers
Designing and implementing of the NQF Tempus Project N° TEMPUS-2008-SE-SMHES ( )
Software Engineering ‘The establishment and use of sound engineering principles (methods) in order to obtain economically software that is reliable and.
BUSINESS INFORMATICS descriptors presentation Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST) Linkoping.
Learning outcomes for BUSINESS INFORMATCIS Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST)
1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.
Chapter 1 Defining Social Studies. Chapter 1: Defining Social Studies Thinking Ahead What do you associate with or think of when you hear the words social.
Considerations for Curricular Development & Change Donna Mannello, DC Logan University.
Lecture # 32 SCIENCE 1 ASSOCIATE DEGREE IN EDUCATION Professional Standards for Teaching Science.
Impact of the New ASA Undergraduate Curriculum Guidelines on the Hiring of Future Undergraduates Robert Vierkant Mayo Clinic, Rochester, MN.
What Is Action Research? Action Research is : Action Research is : - A research methodology - Participative - Responsive - Cyclic “A cycle of posing questions,
Outdoor Logistics.  Outdoor Logistics  Real purpose  Specific plans  Avoids comfort  Determines consequence  Personal confidence  Solves problems.
Amy Wagaman Amherst College Mathematics and Statistics.
8/23/ th ACS National Meeting, Boston, MA POGIL as a model for general education in chemistry Scott E. Van Bramer Widener University.
Maths No Problem; A Mastery Approach.
Overview of Intervention Mapping
Using core competencies in curriculum design
Module 10: Professional Practice
Robert P. King Department of Applied Economics April 14, 2017
Victorian Curriculum: Unpacking Design and Technologies
21st Century Skills in the Classroom
OUTCOME BASED EDUCATION
Big Data Panel Discussion
Day 1 Session 2/ Programme Objectives
CRITICAL CORE: Straight Talk.
Inquiry-based learning and the discipline-based inquiry
The Management Process
Knowledge and practice standards for emerging and initial teacher preparation for early childhood care and education PROFESSOR H.B. EBRAHIM (UNISA) DR.
Project Learning Tree Project Learning Tree is an education program designed for teachers and others working with youth from pre-school through 12th grade.
RAFA and the New Zealand Curriculum
Section 2: Science as a Process
Statistical Data Analysis
CASE STUDY BY: JESSICA PATRON.
Housekeeping: Candidate’s Statement
First-Stage Draft Plans for Gen Ed Revision
School of Information Management Nanjing University China
Federal Land Manager Environmental Database (FED)
A Must to Know - Testing IoT
Meeting LIS Competences to Serve Inclusive Community through Curriculum: Case Study in LIS Study Program UIN Sunan Kalijaga Yogyakarta Indonesia Marwiyah.
Grade 6 Outdoor School Program Curriculum Map
Information Technology (IT)
Nature of Science Understandings for HS
19 Continuous Change.
Victorian Curriculum: Unpacking Design and Technologies
Statistical Data Analysis
Computing and Mathematics
Computer Science Section
Mathematics in the Data Science Movement
Introduction to Basic Research Methods
Why Study MCA. What is MCA? Master of Computer Applications (MCA) is a three-year (six semesters) professional Master's Degree Course in India. The course.
NextGen STEM Teacher Preparation in WA State
Presentation transcript:

The Role of Statistics in Data Science, and Vice Versa Jessica Utts Professor of Statistics University of California, Irvine President, American Statistical Association Nicholas Horton Amherst College 1

Some Issues for Discussion How does statistics (as a discipline) view the emerging field of data science? What can statisticians contribute to data science? What elements of statistics are essential for data science education?

Overview and History Statistics has evolved along with technology and the growth of data Statistics from the 1990s ≠ Statistics today! Foundational goal is the same ASA’s vision statement says it well: “A world that relies on data and statistical thinking to drive discovery and inform decisions” But methods for achieving that goal have changed and expanded

A Very Early Adopter: John Tukey 1962, Annals of Mathematical Statistics Identified four driving forces in the “new science”: The formal theories of statistics Accelerating developments in computers and display devices The challenge, in many fields, of more and ever larger bodies of data The emphasis on quantification in an ever wider variety of disciplines

A Less Early Adopter: Leo Breiman, 2001 “Statistical Modeling: The Two Cultures” There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models… Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools. (Statistical Science, 2001, with discussants)

A Side Comment David Donoho’s “50 Years of Data Science” (2015) is worth reading. His version of Breiman’s 2 cultures: The “Generative [stochastic data] Modeling” culture seeks to develop stochastic models which fit the data, and then make inferences about the data-generating mechanism based on the structure of those models. The “Predictive [algorithmic] Modeling” culture prioritizes prediction… is effectively silent about the underlying mechanism generating the data, and allows for many different predictive algorithms, preferring to discuss only accuracy of prediction made by different algorithms on various datasets.

Identifies foundational data science fields: Fast Forward 14 Years: ASA Statement on Role of Data Science in Statistics, 2015 Identifies foundational data science fields: Database management Statistics and machine learning Distributed and parallel systems Encourages greater, mutually beneficial collaboration across these three fields Intersects with numerous disciplines and related research areas

Many ongoing disciplinary collaborations Some examples: Genomics (and personalized medicine) Health services research (electronic medical records) Business analytics (customer tracking) Smart cities (and sensor networks) Astronomy (data streams) And others…

ASA Statement, Continued Notes that statistics education must evolve to meet needs For example, address inclusion of data science in K-12, community college More later on other aspects of education Elucidates role of statistics in data science

From the ASA Statement: The Role of Statistics Framing questions statistically allows researchers to leverage data resources to extract knowledge and obtain better answers. The central dogma of statistical inference, that there is a component of randomness in data, enables researchers to formulate questions in terms of underlying processes and to quantify uncertainty in their answers. A statistical framework allows researchers to distinguish between causation and correlation and thus to identify interventions that will cause changes in outcomes.

The ASA Statement, continued It also allows them to establish methods for prediction and estimation, to quantify their degree of certainty, and to do all of this using algorithms that exhibit predictable and reproducible behavior. In this way, statistical methods aim to focus attention on findings that can be reproduced by other researchers with different data resources. Simply put, statistical methods allow researchers to accumulate knowledge.

The Statistical Inquiry Cycle Wild and Pfannkuch, 1999, International Statistical Review Problem, Plan, Data, Analysis, Conclusions

How to carry out PPDAC? “This scientific approach to statistical problem-solving is important for all data analysts. It needs to start in the first course and be a consistent theme in all subsequent courses.” - American Statistical Association Guidelines for Undergraduate Programs in Statistics (2014), http://www.amstat.org/asa/education/Curriculum-Guidelines-for-Undergraduate-Programs-in-Statistical-Science.aspx

How to carry out PPDAC? “Working with data requires extensive computing skills. To be prepared for statistics and data science careers, students need facility with professional statistical analysis software, the ability to wrangle data in various ways and algorithmic problem-solving. Students should be fluent in higher-level programming languages and facile with database systems.” - American Statistical Association Guidelines for Undergraduate Programs in Statistics (2014), http://www.amstat.org/asa/education/Curriculum-Guidelines-for-Undergraduate-Programs-in-Statistical-Science.aspx

How to carry out PPDAC? Statistical Methods and Theory: Need to understand issues of design, confounding, and bias, have a foundation in theoretical statistics principles for sound analyses, develop knowledge and gain experience applying a variety of statistical methods, assess appropriateness of methods, and communicate results

How to carry out PPDAC? Data Wrangling and Computation: Need to be facile with professional statistical software program in a higher-level language and think algorithmically, use simulation-based statistical techniques and undertake simulation studies, manage and wrangle data, and undertake analyses in reproducible manner

How to carry out PPDAC? Statistical Practice and Communication: Need to write clearly, speak fluently, and construct effective visual displays and compelling summaries, demonstrate ability to collaborate in teams and to organize and manage projects, incorporate ethical precepts into all aspects of their work, and communicate complex statistical methods in basic terms to managers and other audiences

How to carry out PPDAC? Discipline-Specific Knowledge: Need to apply statistical reasoning to domain-specific questions, translate research questions into statistical questions, and communicate results appropriate to different disciplinary audiences. Skills taken from undergraduate guidelines, but relevant at other levels as well

Park City Group Report (2016) Curriculum Guidelines for Undergraduate Programs in Data Science (DeVeaux + 24 other authors) Data science as science Interdisciplinary nature Data at the core Analytical (computational and statistical) thinking and problem-solving (New pathways for) mathematical foundations Flexibility http://www.amstat.org/asa/files/pdfs/EDU-DataScienceGuidelines.pdf

What do statisticians bring to the table? Importance of context Accounting for variability Design, confounding, and analysis of found (observational) data Understanding of inference, multiplicity and reproducibility issues Statistical analysis (PPDAC) cycle Long history of making decisions with data Experience working on multidisciplinary teams

Some Issues for Discussion How does statistics (as a discipline) view the emerging field of data science? What can statisticians contribute to data science? What elements of statistics are essential for data science education?