Core Methods in Educational Data Mining

Slides:

Advertisements

Similar presentations

Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.

Advertisements

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Core Methods in Educational Data Mining HUDK4050 Fall 2015.

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Core Methods in Educational Data Mining HUDK4050 Fall 2015.

Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.

Core Methods in Educational Data Mining HUDK4050 Fall 2015.

Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.

Core Methods in Educational Data Mining HUDK4050 Fall 2015.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Core Methods in Educational Data Mining HUDK4050 Fall 2015.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 April 15, 2013.

HUDK5199: Special Topics in Educational Data Mining

Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.

Core Methods in Educational Data Mining

Core Methods in Educational Data Mining

PREDICT 422: Practical Machine Learning

Core Methods in Educational Data Mining

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering

PREDICT 422: Practical Machine Learning

Hierarchical Clustering

Advanced Methods and Analysis for the Learning and Social Sciences

Clustering CSC 600: Data Mining Class 21.

Core Methods in Educational Data Mining

A Genetic Algorithm Approach to K-Means Clustering

Core Methods in Educational Data Mining

Core Methods in Educational Data Mining

Boosting and Additive Trees

Machine Learning and Data Mining Clustering

Special Topics in Educational Data Mining

Core Methods in Educational Data Mining

Core Methods in Educational Data Mining

HUDK5199: Special Topics in Educational Data Mining

Core Methods in Educational Data Mining

K-means and Hierarchical Clustering

Core Methods in Educational Data Mining

Data Mining Practical Machine Learning Tools and Techniques

TOP DM 10 Algorithms C4.5 C 4.5 Research Issue:

Core Methods in Educational Data Mining

Revision (Part II) Ke Chen

Evaluation and Its Methods

Core Methods in Educational Data Mining

Revision (Part II) Ke Chen

Learning Analytics: Process & Theory

White Box Clustering in R

Data Mining – Chapter 4 Cluster Analysis Part 2

MIS2502: Data Analytics Clustering and Segmentation

MIS2502: Data Analytics Clustering and Segmentation

Support Vector Machine _ 2 (SVM)

Core Methods in Educational Data Mining

Core Methods in Educational Data Mining

Core Methods in Educational Data Mining

Core Methods in Educational Data Mining

Core Methods in Educational Data Mining

Asst. Prof. Sotarat Thammaboosadee, Ph.D.

Introduction to Machine learning

Machine Learning and Data Mining Clustering

Presentation transcript:

Core Methods in Educational Data Mining EDUC545-010 Spring2017

Discussion forums read-only for auditors Who here is impacted? For-credit students: are you OK with switching to Piazza?

RTFF

Introducing Prof. Alex Bowers Top researcher in the world in using big data for educational leadership All-around nifty guy

Thank you Alex!

RapidMiner Walkthrough Any difficulties getting through the RapidMiner Walkthrough?

Assignment B1

Difficulties with TutorShop?

Difficulties with RapidMiner?

Any questions about k-Means in the homework?

Any questions about expectation maximization in the homework?

Any questions about agglomerative clustering in the homework?

Any other questions about the homework?

Any general questions about k-Means?

If there’s time

Let’s play with clustering a bit Apply k-means using the following points and initial centroids I need 5 volunteers!

+3 -3 time 0 1 pknow

+3 -3 time 0 1 pknow

+3 -3 time 0 1 pknow

+3 -3 time 0 1 pknow

+3 -3 time 0 1 pknow

Any comments on exercise?

Why Is distortion/MSD good for choosing between randomized restarts But bad for choosing cluster size?

Why Isn’t cross-validated distortion/MSD good for choosing cluster size? Why doesn’t cross-validation fix the issue?

What Is the solution?

Is there a better way To choose the number of clusters Than just the adjusted fit?

Silhouette Analysis An increasingly popular method for determining how many clusters to use (Rousseeuw, 1987; Kaufman & Rousseeuw, 1990)

Silhouette Analysis Silhouette plot shows how close each point in a cluster is to points in adjacent clusters Silhouette values scaled from -1 to 1 Close to +1: Data point is far from adjacent clusters Close to 0: Data point is at boundary between clusters Close to -1: Data point is closer to other cluster than its own cluster

Silhouette Formula 𝑆 𝑖 = 𝐵 𝑖 −𝐴(𝑖) max⁡{𝐴 𝑖 , 𝐵(𝑖) For each data point i A(i) = average distance of i from all other data points in same cluster C C* = cluster with lowest average distance of i from all other data points in cluster c* B(i) = average dissimilarity of i from all other data points in cluster C* 𝑆 𝑖 = 𝐵 𝑖 −𝐴(𝑖) max⁡{𝐴 𝑖 , 𝐵(𝑖)

Example from http://scikit-learn Example from http://scikit-learn.org/ stable/auto_examples/cluster/ plot_kmeans_silhouette_analysis.html

Good clusters

Good clusters

Bad clusters

Bad clusters

Bad clusters

So in this example 2 and 4 clusters are reasonable choices 3, 5, and 6 clusters are not good choices

Questions? Comments?

What are the advantages? Of Gaussian Mixture Models

What are the advantages? Of Gaussian Mixture Models Why not use them all the time?

What are the advantages? Of Spectral Clustering

What are the advantages? Of Spectral Clustering Why not use it all the time?

What are the advantages? Of Hierarchical Clustering

What are the advantages? Of Hierarchical Clustering Why not use it all the time?

Clustering: Any Questions?

Factor Analysis .vs. Clustering What’s the difference?

Factor Analysis: Any Questions?

What… Are the general advantages of structure discovery algorithms (clustering, factor analysis) Compared to supervised/prediction modeling methods?

What… Are the general advantages of structure discovery algorithms (clustering, factor analysis) Compared to supervised/prediction modeling methods? What are the disadvantages?

Important point… If you cluster in a well-known domain, you are likely to obtain well-known findings

Because of this… Clustering is relatively popular But somewhat prone to uninteresting papers in education research Where usually a lot is already known So be thoughtful…

Any other questions?

Assignment B2 Classification in Prediction Due February 8

Next Class Wednesday, February 1: Regression in Prediction Readings Baker, R.S. (2015) Big Data and Education. Ch. 1, V2. Witten, I.H., Frank, E. (2011) Data Mining: Practical Machine Learning Tools and Techniques. Sections 4.6, 6.5. [on google drive] Pardos, Z.A., Baker, R.S., San Pedro, M.O.C.Z., Gowda, S.M., Gowda, S.M. (2014) Affective states and state tests: Investigating how affect and engagement during the school year predict end of year learning outcomes. Journal of Learning Analytics, 1 (1), 107-128.

The End