Intro to EDM Why EDM now? Which tools to use in class Week 1, video 1.

Slides:



Advertisements
Similar presentations
Chapter 14: Usability testing and field studies
Advertisements

Performance Assessment
Data Mining vs. Statistics
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Feature Engineering Studio January 21, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2012.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2010.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 18, 2013.
Knowledge Inference: Advanced BKT Week 4 Video 5.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012.
Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.
Data Mining.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Searching for Patterns: Sean Early PSLC Summer School 2007 Question: Which is a better predictor of performance in a cognitive tutor, error rate or assistance.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Data Mining – Intro.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
©2007 Prentice Hall Organizational Behavior: An Introduction to Your Life in Organizations Chapter 19 OB is for Life.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
1 Using R for consumer psychological research Research Analytics | Strategy & Insight September 2014.
Classifiers, Part 1 Week 1, video 3:. Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Analytics: A short introduction Learning Analytics & Machine Learning March 25, 2014 #LAK14.
Highline Class, BI 348 Basic Business Analytics using Excel, Chapter 01 Intro to Business Analytics BI 348, Chapter 01.
Education 793 Class Notes Welcome! 3 September 2003.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Human Computer Interaction
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Some Comments on “The Reports of My Death are Greatly Exaggerated – Expert Systems Research in Accounting” Daniel E. O’Leary University of Southern California.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
M Machine Learning F# and Accord.net.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 January 23, 2013.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Workshop #1 Writing Quality Formative and Performance Based Assessments for MS Science.
1 Seattle University Master’s of Science in Business Analytics Key skills, learning outcomes, and a sample of jobs to apply for, or aim to qualify for,
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Using Big Data and Analytics to Improve Learning and Education George Siemens CIDER February 10, 2016.
FALL 2007 DIANNE HANSFORD CPI 101: Introduction to Informatics Sumber dari : IntroLecture.ppt.
Data-Driven Education
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
Research Methods in I/O Psychology
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
Core Methods in Educational Data Mining
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Data Mining: Concepts and Techniques Course Outline
Big Data in Education Rachel Hogue.
Learning Analytics Methods, Benefits, and Challenges in Higher Education: A Systematic Literature Review Sandy Center for Educational and Instructional.
Big Data, Education, and Society
Big Data, Education, and Society
Course Introduction CSC 576: Data Mining.
Data Warehousing Data Mining Privacy
Research on Geoscience Learning
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Intro to EDM Why EDM now? Which tools to use in class Week 1, video 1

Big Data in Education

This textbook  In this textbook, you’ll learn methods used for exploring big data in education

Two communities  International Educational Data Mining Society  First event: EDM workshop in 2005 (at AAAI)  First conference: EDM2008  Publishing JEDM since 2009  Society for Learning Analytics Research  First conference: LAK2011  Journal of Learning Analytics (founded 2012)

Two communities  Joint goal of exploring the “big data” now available on learners and learning  To promote  New scientific discoveries & to advance learning sciences  Better assessment of learners along multiple dimensions Social, cognitive, emotional, meta-cognitive, etc. Individual, group, institutional, etc.  Better real-time support for learners

EDM/LA is…  “… escalating the speed of research on many problems in education.”  “Not only can you look at unique learning trajectories of individuals, but the sophistication of the models of learning goes up enormously.” Arthur Graesser, Editor, Journal of Educational Psychology

EDM/LA is…  “… great.”  Me

EDM and LAK  Despite the area’s newness, we’ve learned a few things about key problems  This course is about methods that have been found to be useful for those problems by EDM/LAK researchers  For more theoretical overview of the field, see SoLAR MOOC 

Where do methods come from?  Some of the methods would be familiar to someone with a background in Data Mining or Machine Learning  Some of the methods would be familiar to someone with a background in Psychometrics or traditional Statistics  You don’t have to have either of these backgrounds to get something out of the course  Pick and choose what you find most useful

A few words for data miners  You’ll find that there are some current trends in data mining that aren’t represented  Some of those haven’t gotten here yet  Some of those haven’t been very useful yet  Educational data is big, but it’s not google-big  I’ll be focusing on the methods of broadest usefulness, not coolest newestness

A few words for data miners  Also, you may find some classic algorithms aren’t well represented  One example is neural networks  They haven’t been all that heavily used in EDM, and one reason is that over-fitting is a plague in the highly context-based (and not that big) data sets we use

Not that big?  But the name of the course is big data in education!

Not that big?  Big data in education is big  Big by comparison to most classical education research  Big compared to common data sets in many domains  But it’s not human genome project or google big

It is big enough  That differences in r 2 of routinely come up as statistically significant (Wang, Heffernan, & Beck, 2011; Wang & Heffernan, 2013)

I will talk about statistical significance  Sometimes  But it will not be a focus of the class

Types of EDM/LA method (Baker & Siemens, in press; building off of Baker & Yacef, 2009)  Prediction  Classification  Regression  Latent Knowledge Estimation  Structure Discovery  Clustering  Factor Analysis  Domain Structure Discovery  Network Analysis  Relationship mining  Association rule mining  Correlation mining  Sequential pattern mining  Causal data mining  Distillation of data for human judgment  Discovery with models

Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables)  Which students are off-task?  Which students will fail the class?

Structure Discovery  Find structure and patterns in the data that emerge “naturally”  No specific target or predictor variable

Relationship Mining  Discover relationships between variables in a data set with many variables

Discovery with Models  Pre-existing model (developed with EDM prediction methods… or clustering… or knowledge engineering)  Applied to data and used as a component in another analysis

Why now?  Why didn’t EDM emerge in the early 1980s, like bioinformatics?

A lot of reasons  One of the key ones: not enough data  In the 1980s, collecting educational data was highly resource-intensive and difficult to scale  Much of the data that was easily collectible was purely summative in nature  Getting data on learning processes and learner behaviors, in field settings, required methods like Quantitative field observations Video recordings Think-Aloud studies  None of which scale easily

Fast-forward to today  Lots of standardized exams  Still summative in nature  But lots of students now use internet-based educational software in class  Can be used to get at learning processes and learner behaviors  At a fine-grained scale (can log behavior at a second by second level)  Data acquisition is very scalable  And there are these things called MOOCs which you may have heard of….

PSLC DataShop (Koedinger et al, 2008, 2010)  World’s leading public repository for educational software interaction data  >250,000 hours of students using educational software  >30 million student actions, responses & annotations  Actions: entering an equation, manipulating a vector, typing a phrase, requesting help  Responses: error feedback, strategic hints  Annotations: correctness, time, skill/concept

Tools  There are a bunch of tools you can use in this class.  I don’t have strong requirements about which tools you choose to use.  We’ll talk about them throughout the course.  You may want to think about downloading or setting up accounts for  RapidMiner 5.3  SAS OnDemand for Academics  Weka  Microsoft Excel  Java  Matlab  No hurry, but keep it in mind…

Closing thoughts  EDM/LAK methods emerging for big data in education  In this class, you’ll learn the key methods and how to use them for  Promoting scientific discovery  Driving intervention and improvements in educational software and systems  Strengths & weaknesses of methods for different applications  Is your analysis trustworthy? Is it applicable?