Core Methods in Educational Data Mining

Slides:



Advertisements
Similar presentations
Intro to EDM Why EDM now? Which tools to use in class Week 1, video 1.
Advertisements

Feature Engineering Studio January 21, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
CS 197 Computers in Society Fall, Welcome, Freshmen!
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
Nsm.uh.edu Math Courses Available After College Algebra.
Three Hours a Week?: Determining the Time Students Spend in Online Participatory Activity Abbie Brown, Ph.D. East Carolina University Tim Green, Ph.D.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 12, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 13, 2010.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
WELCOME to CS244 Brent M. Dingle, Ph.D Game Design and Development Program Mathematics, Statistics and Computer Science University of Wisconsin -
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
COP4610/CGS5765 Operating Systems Syllabus. Instructor Xin Yuan Office: 168 LOV Office hours: W M F 9:10am – 10:00am, or by appointments.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 January 23, 2013.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Research Experience Program (REP) Spring 2008 Psychology 100 Ψ.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Research Experience Program (REP) Fall 2007 Psychology 100 Ψ.
ISP 1600 for Winter 2005 Web.Edu: How Internet Courses Work Course web site: Third meeting January 27, 2005.
1 1.Log in to the computer in front of you –Temp account: 210class / 2.Update your in Cascadia's system –If I need to you I'll use.
WELCOME to MIS 5302 Managing Technology and Systems
Networking CS 3470, Section 1 Sarah Diesburg
APPLIED MANAGEMENT SCIENCE IN AGRICULTURAL SYSTEMS I
IST256 : Applications Programming for Information Systems
Course Information and Introductions
Course Information and Introductions
Principles of Algebra 6th grade math
Learning Analytics How can I identify and help my struggling students sooner rather than later? How can I see which concepts students struggle with in.
Principles of Algebra 6th grade math
IMPACTS OF ICT IN EDUCATION
eMINTS Parent Information Meeting
Subcontracting SBP 210 Lesson 1: Introduction
MIS323 Business Telecommunications
Course Information Mark Stanovich Principles of Operating Systems
Database Design and Implementation
7th Grade Mathematics Overview Materials Classroom Policies
Online Learning in Agricultural & Life Sciences
Course Information and Introductions
AP English Language and Composition
Andy Wang Operating Systems COP 4610 / CGS 5765
Big Data, Education, and Society
Core Methods in Educational Data Mining
Big Data, Education, and Society
Andy Wang Operating Systems COP 4610 / CGS 5765
Chapter GS Getting Started.
Welcome to CS220/MATH 320 – Applied Discrete Mathematics Fall 2018
MIS323 Business Telecommunications
Chapter GS Getting Started.
Log in to the computer in front of you
Welcome to AP Calc AB with Ms. Pfenning
Welcome to Mrs. Black’s World Geography Room 119
Core Methods in Educational Data Mining
Software Usability Course notes for CSI University of Ottawa
CIS5930: Advanced Topics in Parallel and Distributed Systems
Welcome to CS 410 – Introduction to Software Engineering Spring 2019
BIT 115: Introduction To Programming
Chapter GS Getting Started.
Welcome to Physics 5305!!.
Core Methods in Educational Data Mining
AP World History Introduction.
Andy Wang Operating Systems COP 4610 / CGS 5765
CSCI 203: Introduction to Computer Science I
Chapter GS Getting Started.
How to Use Learning Analytics in Moodle
Welcome! to Ms. O’Connor’s Grade 7 English Class 8/26/2019.
Course Introduction Data Visualization & Exploration – COMPSCI 590
Welcome! Mr. Stan Debiec School Year.
Presentation transcript:

Core Methods in Educational Data Mining HUDK4050 Spring 2017

Before we get started If you have a PC laptop (or a Mac set up to run PC applications), please copy RMLuc from this flash drive to your laptop

Welcome!

Administrative Stuff Is everyone signed up for class? If not, and you want to receive credit, please talk to me after class

Class Schedule

Class Schedule Updated versions will be available on the course webpage Readings are mostly available on the webpage Those not publicly available will be made available at https://drive.google.com/folderview?id=0B3e6NaCpKireVGdOQ0VPN29qMVE&usp=sharing

Class Schedule If any schedule changes happen due to unforeseen circumstances Online schedule will be kept up-to-date

Required Texts Baker, R.S. (2015) Big Data and Education. 2nd edition. http:/www.columbia.edu/~rsb2162/ bigdataeducation.html

Readings This is a graduate class I expect you to decide what is crucial for you And what you should skim to be prepared for class discussion and for when you need to know it in 8 years

Readings That said

Readings and Participation It is expected that you come to class, unless you have a very good reason not to It is expected that you watch Big Data and Education videos before class, so we can discuss them rather than me repeating them It is expected that you be prepared for class by skimming the readings to the point where you can participate effectively in class discussion

Course Goals This course covers methods from the emerging area of educational data mining. You will learn how to execute these methods in standard software packages And the limitations of existing implementations of these methods. Equally importantly, you will learn when and why to use these methods.

Course Goals Discussion of how EDM differs from more traditional statistical and psychometric approaches will be a key part of this course In particular, we will study how many of the same statistical and mathematical approaches are used in different ways in these research communities.

Assignments There will be 8 basic homeworks You choose 6 of them to complete 3 from the first 4 (e.g. BHW 1-4) 3 from the second 4 (e.g. BHW 5-8)

Basic homeworks Basic homeworks will be due before the class session where their topic is discussed

Why? These are not your usual homeworks Most homework is assigned after the topic is discussed in class, to reinforce what is learned This homework is (generally) due before the topic is discussed in class, to enable us to talk more concretely about the topic in class

How to do Basic Homework Use TutorShop account emailed to you If you do not have a TutorShop account, please email me right away

Assignments There will be 4 creative homeworks You choose 3 of them to complete You must complete the last creative homework

Creative homeworks Creative homeworks will be due after the class session where their topic is discussed

Why? These homeworks will involve creative application of the methods discussed in class, going beyond what we discuss in class

These homeworks These homeworks will not require flawless, perfect execution They will require personal discovery and learning from text and video resources Giving you a base to learn more from class discussion

Assignments Homeworks will be due at least 2 hours before the beginning of class (e.g. noon) on the due date Since you have a choice of homeworks, extensions will only be granted for instructor error or extreme circumstances Outside of these situations, late = 0 credit

You can not do extra work If you do extra assignments I will grade the first 3 of each 4 basic assignments I will grade creative assignments 1,2, and 4 I will give you feedback but no extra credit You cannot get extra credit by doing more assignments You cannot pick which assignments I grade after the fact Are there any questions about this?

Because of that You must be prepared to discuss your work in class You do not need to create slides But be prepared to have your assignment projected to discuss aspects of your assignment in class

Stressed out about not having done data mining before?

If you’re worried Come talk to me I try to find a way to accommodate every student

Homework All assignments for this class are individual assignments You must turn in your own work It cannot be identical to another student’s work (except where the Basic Assignments make all assignments identical) The goal of the Creative Assignments is to get diverse solutions we can discuss in class However, you are welcome to discuss the readings or technical details of the assignments with each other Including on the class discussion forum

Examples Buford can’t figure out the UI for the software tool. Alpharetta helps him with the UI. OK! Deanna is struggling to understand the item parameter in PFA to set up the mathematical model. Carlito explains it to her.

Examples Fernando and Evie do the assignment together from beginning to end, but write it up separately. Not OK Giorgio and Hannah do the assignment separately, but discuss their (fairly different) approaches over lunch OK!

Plagiarism and Cheating: Boilerplate Slide Don’t do it If you have any questions about what it is, talk to me before you turn in an assignment that involves either of these University regulations will be followed to the letter That said, I am not really worried about this problem in this class

Grading 6 of 8 Basic Assignments 3 of 4 Creative Assignments 6% each (up to a maximum of 36%) 3 of 4 Creative Assignments 13% each (up to a maximum of 39%) Class participation 25% PLUS: For every creative homework, there will be a special bonus of 20% for the best hand‐in. “Best” will be defined in each assignment.

Accommodations for Students with Disabilities Please email me to set up a meeting so we can best accommodate you

Finding me Best way to reach me is email I am happy to set up meetings with you Better to set up a meeting with me than to just show up at my office

Discussion Forums Before emailing me, if you have a technical question or a question of general interest for the class Post to the Canvas forum! I will check there before I check my email And maybe one of your classmates will have the answer!

Questions Any questions on the syllabus, schedule, or administrative topics?

Who are you And why are you here? What kind of methods do you use in your research/work? What kind of methods do you see yourself wanting to use in the future?

This Class

“the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs.” To quote the Society for Learning Analytics Research… EDM and learning analytics methods have some similarities with traditional data mining methods, but as with the other areas where data mining methods have been common: bioinformatics, medical informatics, business analytics, data analysis methods in physics, and so on, the unique features of the domain of education leads to the development of unique methods. (www.solaresearch.org/mission/about)

Goals Joint goal of exploring the “big data” now available on learners and learning To promote New scientific discoveries & to advance science of learning Better assessment of learners along multiple dimensions Social, cognitive, emotional, meta-cognitive, etc. Individual, group, institutional, etc. Better real-time support for learners

The explosion in data is supporting a revolution in the science of learning Large-scale studies have always been possible… But it was hard to be large-scale and fine-grained And it was expensive

EDM is… “… escalating the speed of research on many problems in education.” “Not only can you look at unique learning trajectories of individuals, but the sophistication of the models of learning goes up enormously.” Arthur Graesser, Former Editor, Journal of Educational Psychology

Types of EDM/LA Method (Baker & Siemens, 2014; building off of Baker & Yacef, 2009) Prediction Classification Regression Latent Knowledge Estimation Structure Discovery Clustering Factor Analysis Domain Structure Discovery Network Analysis Relationship mining Association rule mining Correlation mining Sequential pattern mining Causal data mining Distillation of data for human judgment Discovery with models

Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) Which students are bored? Which students will fail the class?

Structure Discovery Find structure and patterns in the data that emerge “naturally” No specific target or predictor variable What problems map to the same skills? Are there groups of students who approach the same curriculum differently? Which students develop more social relationships in MOOCs?

Structure Discovery Different kinds of structure discovery algorithms find…

Structure Discovery Different kinds of structure discovery algorithms find… different kinds of structure Clustering: commonalities between data points Factor analysis: commonalities between variables Domain structure discovery: structural relationships between data points (typically items) Network analysis: network relationships between data points (typically people)

Relationship Mining Discover relationships between variables in a data set with many variables Association rule mining Correlation mining Sequential pattern mining Causal data mining

Relationship Mining Discover relationships between variables in a data set with many variables Are there trajectories through a curriculum that are more or less effective? Which aspects of the design of educational software have implications for student engagement?

Discovery with Models Pre-existing model (developed with EDM prediction methods… or clustering… or knowledge engineering) Applied to data and used as a component in another analysis

Distillation of Data for Human Judgment Making complex data understandable by humans to leverage their judgment

Why now? Just plain more data available Education can start to catch up to research in Physics and Biology…

Why now? Just plain more data available Education can start to catch up to research in Physics and Biology… from the year 1985

Why now? In particular, the amount of data available in education is orders of magnitude more than was available just a decade ago

Basic HW 1 Due in two weeks Note that this assignment requires the use of RapidMiner We will learn how to set up and use RapidMiner in the next class session this Wednesday So please install RapidMiner 5.3 on your laptop if possible before then (not the latest version) And bring your laptop to class

Let’s look at Basic HW 1’s User Interface

Questions? Concerns?

Background in Statistics This is not a statistics class But I will compare EDM methods to statistics throughout the class Most years, I offer a special session “An Inappropriately Brief Introduction to Frequentist Statistics” Would folks like me to schedule this?

Other questions or comments?

Next Class Wednesday, January 15 Clustering Baker, R.S. (2015) Big Data and Education. Ch. 7, V1, V2, V3, V4, V5. Bowers, A.J. (2010) Analyzing the Longitudinal K-12 Grading Histories of Entire Cohorts of Students: Grades, Data Driven Decision Making, Dropping Out and Hierarchical Cluster Analysis. Practical Assessment, Research & Evaluation (PARE), 15(7), 1-18.  Lee, J., Recker, M., Bowers, A.J., Yuan, M. (2016). Hierarchical Cluster Analysis Heatmaps and Pattern Analysis: An Approach for Visualizing Learning Management System Interaction Data. A poster presented at the annual International Conference on Educational Data Mining (EDM) Assignment Basic 1 due

The End