Core Methods in Educational Data Mining HUDK4050 Fall 2015.

Slides:



Advertisements
Similar presentations
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Advertisements

Clustering Beyond K-means
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Basic Data Mining Techniques
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Data Mining BS/MS Project Clustering for Market Segmentation Presentation by Mike Calder.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
Credibility: Evaluating what’s been learned This Lecture based on Ch 5 of Witten & Frank Plan for this week 3 classes before Midterm Paper and Survey discussion.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 17, 2012.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Clustering Hongfei Yan School of EECS, Peking University 7/8/2009 Refer to Aaron Kimball’s slides.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
1 Chapter 4 Generating Permutations and Combinations.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Ch. Eick: Randomized Hill Climbing Techniques Randomized Hill Climbing Neighborhood Hill Climbing: Sample p points randomly in the neighborhood of the.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 April 15, 2013.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Advanced Methods and Analysis for the Learning and Social Sciences
Clustering CSC 600: Data Mining Class 21.
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Clustering and randomness
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Presentation transcript:

Core Methods in Educational Data Mining HUDK4050 Fall 2015

Quick question I’m now going to be in NY the first week of December I was thinking of un-cancelling class on 12/1, in exchange for 12/8 What do people prefer? Let’s take a vote

Assignment B7

Any questions about k-Means in the homework?

Any questions about expectation maximization in the homework?

Any questions about agglomerative clustering in the homework?

Any questions?

Any general questions about k-Means?

Let’s play with clustering a bit Apply k-means using the following points and initial centroids I need 5 volunteers!

01 pknow time

01 pknow time

01 pknow time

01 pknow time

01 pknow time

Any comments on exercise?

Why Is distortion/MSD good for choosing between randomized restarts But bad for choosing cluster size?

Why Isn’t cross-validated distortion/MSD good for choosing cluster size? Why doesn’t cross-validation fix the issue?

What Is the solution?

Is there a better way To choose the number of clusters Than just the adjusted fit?

Silhouette Analysis An increasingly popular method for determining how many clusters to use (Rousseeuw, 1987; Kaufman & Rousseeuw, 1990)

Silhouette Analysis Silhouette plot shows how close each point in a cluster is to points in adjacent clusters Silhouette values scaled from -1 to 1 – Close to +1: Data point is far from adjacent clusters – Close to 0: Data point is at boundary between clusters – Close to -1: Data point is closer to other cluster than its own cluster

Silhouette Formula

Example from stable/auto_examples/cluster/ plot_kmeans_silhouette_analysis.html

Good clusters

Bad clusters

So in this example 2 and 4 clusters are reasonable choices 3, 5, and 6 clusters are not good choices

Questions? Comments?

What are the advantages? Of Gaussian Mixture Models

What are the advantages? Of Gaussian Mixture Models Why not use them all the time?

What are the advantages? Of Spectral Clustering

What are the advantages? Of Spectral Clustering Why not use it all the time?

What are the advantages? Of Hierarchical Clustering

What are the advantages? Of Hierarchical Clustering Why not use it all the time?

Clustering: Any Questions?

Factor Analysis.vs. Clustering What’s the difference?

Factor Analysis: Any Questions?

What… Are the general advantages of structure discovery algorithms (clustering, factor analysis) Compared to supervised/prediction modeling methods?

What… Are the general advantages of structure discovery algorithms (clustering, factor analysis) Compared to supervised/prediction modeling methods? What are the disadvantages?

Important point… If you cluster in a well-known domain, you are likely to obtain well-known findings

Because of this… Clustering is relatively popular But somewhat prone to uninteresting papers in education research – Where usually a lot is already known So be thoughtful…

Amershi & Conati (2009) Any questions?

Bowers (2010) Any questions?

Any other questions?

Assignment B8 Sequential Pattern Mining Due next Tuesday

Next Class Wednesday, November 19: Association Rule Mining Readings Baker, R.S. (2015) Big Data and Education. Ch. 5, V3. Merceron, A., Yacef, K. (2008) Interestingness Measures for Association Rules in Educational Data. Proceedings of the 1st International Conference on Educational Data Mining, Bazaldua, D.A.L., Baker, R.S., San Pedro, M.O.Z. (2014) Combining Expert and Metric-Based Assessments of Association Rule Interestingness. Proceedings of the 7th International Conference on Educational Data Mining.

The End