Clustering Seminar Social Media Mining University UC3M Date May 2017

Slides:



Advertisements
Similar presentations
DEPARTMENT OF INFORMATION STUDIES Andy Dawson LIS1510 Library and Archives Automation Issues Basics of XHTML Andy Dawson Department of Information Studies,
Advertisements

Copyright Jiawei Han, modified by Charles Ling for CS411a
Clustering.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering, DBSCAN The EM Algorithm
Albert Gatt Corpora and Statistical Methods Lecture 13.
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
MASTERY OBJECTIVE: Learn parts of an html document Learn basic html tags HTML-An Introduction.
GNANA SUNDAR RAJENDIRAN JOYESH MISHRA RISHI MISHRA FALL 2008 BIOINFORMATICS Clustering Method for Repeat Analysis in DNA sequences.
Highlights Lecture on the image part (10) Automatic Perception 16
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Overview of Search Engines
Minimum Spanning Trees and Clustering By Swee-Ling Tang April 20, /20/20101.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
File System Management File system management encompasses the provision of a way to store your data in a computer, as well as a way for you to find and.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CS 478 – Tools for Machine Learning and Data Mining Clustering Quality Evaluation.
Bioinformatics Curriculum Issues, goals, curriculum.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Clustering Algorithm CS 157B JIA HUANG. Definition Data clustering is a method in which we make cluster of objects that are somehow similar in characteristics.
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Clustering, performance evaluation, and Term Project 1.Term Project 2.Resource for review.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Quiz Title Your name goes here. Question 1 Click here for answer Click here for answer Go to question 2 Go to question 2.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Information Retrieval in Practice
Department of Computer Science & Engineering
Clustering Anna Reithmeir Data Mining Proseminar 2017
What Is Cluster Analysis?
Data Mining: Basic Cluster Analysis
Hierarchical Clustering
Semi-Supervised Clustering
IAT 355 Clustering ______________________________________________________________________________________.
Clustering Patrice Koehl Department of Biological Sciences
LIS1510 Library and Archives Automation Issues Basics of XHTML
Hierarchical Clustering
Data Mining K-means Algorithm
K-Means Seminar Social Media Mining University UC3M Date May 2017
Link-Based Ranking Seminar Social Media Mining University UC3M
Hierarchical Clustering
Vector Space Model Seminar Social Media Mining University UC3M
Supervised Learning Seminar Social Media Mining University UC3M
Evaluation of Relational Operations
Topic 3: Cluster Analysis
Community detection in graphs
K-means and Hierarchical Clustering
Unit 3 Exam.
CSE572, CBS598: Data Mining by H. Liu
Discovering Functional Communities in Social Media
Hierarchical and Ensemble Clustering
Análisis de Cluster.
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Clustering Wei Wang.
Document Structure & HTML
Text Categorization Berlin Chen 2003 Reference:
Group 9 – Data Mining: Data
Ying Dai Faculty of software and information science,
Topic 5: Cluster Analysis
SEEM4630 Tutorial 3 – Clustering.
Information Retrieval and Web Design
Presentation transcript:

Clustering Seminar Social Media Mining University UC3M Date May 2017 Lecturer Carlos Castillo http://chato.cl/ Sources: Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, May 2014. Part 3. [download] Evimaria Terzi: Data Mining course at Boston University http://www.cs.bu.edu/~evimaria/cs565-13.html

Why these sizes? Why 3 groups instead of 2?

Clustering Given a set of elements (e.g. documents) Group similar elements together So that: Inside a group, elements are similar Across groups, elements are different

Why do we cluster? Clustering results are used: As a stand-alone tool to get insight into data distribution Visualization of clusters may unveil important information As a preprocessing step for other algorithms Efficient indexing or compression often relies on clustering

Applications Image Processing Cluster images based on their visual content Web Cluster groups of users based on their access patterns on webpages Cluster webpages based on their content Bioinformatics Cluster similar proteins together (similarity wrt chemical structure and/or functionality etc) Many more…

http://dx.doi.org/10.1109/IVL.2000.853847

http://musicmachinery.com/2013/09/22/5025/

http://www.nature.com/articles/srep00196/figures/2

Clustering questions How many clusters? Given as input or determined by algorithm How good is a clustering? Intra similarity, inter similarity, number of clusters Can an element belong to > 1 cluster? Hard clustering vs Soft clustering

How many clusters? Boston University Slideshow Title Goes Here

Types of clusterings Partitional Boston University Slideshow Title Goes Here Partitional each object belongs in exactly one cluster Hierarchical a set of nested clusters organized in a tree

Partitional algorithms Boston University Slideshow Title Goes Here partition the n objects into k clusters each object belongs to exactly one cluster the number of clusters k is given in advance

Partitional clustering Boston University Slideshow Title Goes Here Original points Partitional clustering

Example: 1-dimensional clustering Communism Socialism Liberalism Conservatism Monarchism Fascism

Parenthesis: 2D political spectrum http://www.termometropolitico.it/119350_dai-modelli-collocazione-nello-spazio-politico-test-per-elezioni-europee-2014.html

1D political spectrum (Spain) http://elpais.com/elpais/2015/10/17/media/1445097073_329371.html

1 dimensional clustering 5 11 13 16 25 36 38 39 42 60 62 64 67 How would you cluster this data? Why?

1 dimensional clustering 5 11 13 16 25 36 38 39 42 60 62 64 67 What about now, how would you cluster?

Two very important metrics Minimum inter-cluster distance (should be large) Maximum intra-cluster distance (should be small)

1 dimensional clustering 5 11 13 16 25 36 38 39 42 60 62 64 67 5 11 13 16 25 36 38 39 42 60 62 64 67 5 11 13 16 25 36 38 39 42 60 62 64 67 Exercise: For each of these 3 clusterings: Compute minimum inter-cluster distance. Compute maximum intra-cluster distance. Answer: clustering_exercise_01_answer.txt