Clustering Shallow Processing Techniques for NLP Ling570 November 30, 2011.

Slides:

Advertisements

Similar presentations

Albert Gatt Corpora and Statistical Methods Lecture 13.

Advertisements

Clustering Basic Concepts and Algorithms

PARTITIONAL CLUSTERING

Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.

Unsupervised learning

Introduction to Bioinformatics

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

I256 Applied Natural Language Processing Fall 2009

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

Clustering Evaluation April 29, Today Cluster Evaluation – Internal We don’t know anything about the desired labels – External We have some information.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.

Clustering Color/Intensity

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

Unsupervised Learning and Data Mining

Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.

Cluster Analysis (1).

Introduction to Bioinformatics - Tutorial no. 12

What is Cluster Analysis?

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.

Bootstrapping Goals: –Utilize a minimal amount of (initial) supervision –Obtain learning from many unlabeled examples (vs. selective sampling) General.

Clustering Unsupervised learning Generating “classes”

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.

Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.

Text Clustering.

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

Unsupervised Learning. Supervised learning vs. unsupervised learning.

Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.

Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.

Clustering Gene Expression Data BMI/CS 576 Colin Dewey Fall 2010.

Clustering C.Watters CS6403.

Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Flat clustering approaches

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

1 CS 391L: Machine Learning Clustering Raymond J. Mooney University of Texas at Austin.

1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.

Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.

Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.

Data Mining and Text Mining. The Standard Data Mining process.

Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Sampath Jayarathna Cal Poly Pomona

Semi-Supervised Clustering

Machine Learning Clustering: K-means Supervised Learning

Data Mining K-means Algorithm

Information Organization: Clustering

Revision (Part II) Ke Chen

Text Categorization Berlin Chen 2003 Reference:

Clustering Techniques

Presentation transcript:

Clustering Shallow Processing Techniques for NLP Ling570 November 30, 2011

Roadmap Clustering Motivation & Applications Clustering Approaches Evaluation

Clustering Task: Given a set of objects, create a set of clusters over those objects Applications:

Clustering Task: Given a set of objects, create a set of clusters over those objects Applications: Exploratory data analysis Document clustering Language modeling Generalization for class-based LMs Unsupervised Word Sense Disambiguation Automatic thesaurus creations Unsupervised Part-of-Speech Tagging Speaker clustering,….

Example: Document Clustering Input: Set of individual documents Output: Sets of document clusters Many different types of clustering:

Example: Document Clustering Input: Set of individual documents Output: Sets of document clusters Many different types of clustering: Category: news, sports, weather, entertainment

Example: Document Clustering Input: Set of individual documents Output: Sets of document clusters Many different types of clustering: Category: news, sports, weather, entertainment Genre clustering: Similar styles: blogs, tweets, newswire

Example: Document Clustering Input: Set of individual documents Output: Sets of document clusters Many different types of clustering: Category: news, sports, weather, entertainment Genre clustering: Similar styles: blogs, tweets, newswire Author clustering

Example: Document Clustering Input: Set of individual documents Output: Sets of document clusters Many different types of clustering: Category: news, sports, weather, entertainment Genre clustering: Similar styles: blogs, tweets, newswire Author clustering Language ID: language clusters

Example: Document Clustering Input: Set of individual documents Output: Sets of document clusters Many different types of clustering: Category: news, sports, weather, entertainment Genre clustering: Similar styles: blogs, tweets, newswire Author clustering Language ID: language clusters Topic clustering: documents on the same topic OWS, debt supercommittee, Seattle Marathon, Black Friday..

Example: Word Clustering Input: Words Barbara, Edward, Gov, Mary, NFL, Reds, Scott, Sox, ballot, finance, inning, payments, polls, profit, quarterback, researchers, science, score, scored, seats Output: Word clusters

Example: Word Clustering Input: Words Barbara, Edward, Gov, Mary, NFL, Reds, Scott, Sox, ballot, finance, inning, payments, polls, profit, quarterback, researchers, science, score, scored, seats Output: Word clusters Example clusters:

Example: Word Clustering Input: Words Barbara, Edward, Gov, Mary, NFL, Reds, Scott, Sox, ballot, finance, inning, payments, polls, profit, quarterback, researchers, science, score, scored, seats Output: Word clusters Example clusters: (from NYT) ballot, polls, Gov, seats profit, finance, payments NFL, Reds, Sox, inning, quarterback, scored, score researchers, science Scott, Mary, Barbara, Edward

Questions What should a cluster represent? Due to F. Xia

Questions What should a cluster represent? Similarity among objects How can we create clusters? Due to F. Xia

Questions What should a cluster represent? Similarity among objects How can we create clusters? How can we evaluate clusters? Due to F. Xia

Questions What should a cluster represent? Similarity among objects How can we create clusters? How can we evaluate clusters? How can we improve NLP with clustering? Due to F. Xia

Similarity Between two instances

Similarity Between two instances Between an instance and a cluster

Similarity Between two instances Between an instance and a cluster Between clusters

Similarity Measures Given x=(x 1,x 2,…,x n ) and y=(y 1,y 2,…,y n )

Similarity Measures Given x=(x 1,x 2,…,x n ) and y=(y 1,y 2,…,y n ) Euclidean distance:

Similarity Measures Given x=(x 1,x 2,…,x n ) and y=(y 1,y 2,…,y n ) Euclidean distance: Manhattan distance:

Similarity Measures Given x=(x 1,x 2,…,x n ) and y=(y 1,y 2,…,y n ) Euclidean distance: Manhattan distance: Cosine similarity:

Clustering Algorithms

Types of Clustering Flat vs Hierarchical Clustering: Flat: partition data into k clusters

Types of Clustering Flat vs Hierarchical Clustering: Flat: partition data into k clusters Hierarchical: Nodes form hierarchy

Types of Clustering Flat vs Hierarchical Clustering: Flat: partition data into k clusters Hierarchical: Nodes form hierarchy Hard vs Soft Clustering Hard: Each object assigned to exactly one cluster

Types of Clustering Flat vs Hierarchical Clustering: Flat: partition data into k clusters Hierarchical: Nodes form hierarchy Hard vs Soft Clustering Hard: Each object assigned to exactly one cluster Soft: Allows degrees of membership and membership in more than one cluster Often probability distribution over cluster membership

Hierarchical Clustering

Hierarchical Vs. Flat Hierarchical clustering:

Hierarchical Vs. Flat Hierarchical clustering: More informative Good for data exploration Many algorithms, none good for all data Computationally expensive

Hierarchical Vs. Flat Hierarchical clustering: More informative Good for data exploration Many algorithms, none good for all data Computationally expensive Flat clustering:

Hierarchical Vs. Flat Hierarchical clustering: More informative Good for data exploration Many algorithms, none good for all data Computationally expensive Flat clustering: Fairly efficient Simple baseline algorithm: K-means Probabilistic models use EM algorithm

Clustering Algorithms Flat clustering: K-means clustering K-medoids clustering Hierarchical clustering: Greedy, bottom-up clustering

K-Means Clustering Initialize: Randomly select k initial centroids

K-Means Clustering Initialize: Randomly select k initial centroids Center (mean) of cluster Iterate until clusters stop changing

K-Means Clustering Initialize: Randomly select k initial centroids Center (mean) of cluster Iterate until clusters stop changing Assign each instance to the nearest cluster Cluster is nearest if cluster centroid is nearest

K-Means Clustering Initialize: Randomly select k initial centroids Center (mean) of cluster Iterate until clusters stop changing Assign each instance to the nearest cluster Cluster is nearest if cluster centroid is nearest Recompute cluster centroids Mean of instances in the cluster

K-Means: 1 step

K-Means Running time:

K-Means Running time: O(n) – where n is the number of clusters Converges in finite number of steps Issues:

K-Means Running time: O(n) – where n is the number of clusters Converges in finite number of steps Issues: Need to pick # clusters k Can find only local optimum Sensitive to outliers Requires Euclidean distance: What about enumerable classes (e.g. colors)?

Medoid Medoid: Element in cluster with highest average similarity to other elements in cluster

Medoid Medoid: Element in cluster with highest average similarity to other elements in cluster Finding the medoid: For each element compute:

Medoid Medoid: Element in cluster with highest average similarity to other elements in cluster Finding the medoid: For each element compute: Select the element with highest f(p)

K-Medoids Initialize: Select k instances at random as medoids

K-Medoids Initialize: Select k instances at random as medoids Iterate until no changes Assign instances to cluster with nearest medoid

K-Medoids Initialize: Select k instances at random as medoids Iterate until no changes Assign instances to cluster with nearest medoid Recompute medoid for each cluster

Greedy, Bottom-Up Hierarchical Clustering Initialize: Make an individual cluster for each instance

Greedy, Bottom-Up Hierarchical Clustering Initialize: Make an individual cluster for each instance Iterate until all instances in same cluster Merge two most similar clusters

Evaluation

With respect to gold standard Accuracy For each cluster, assign most common label to all items Rand index F-measure Alternatives:

Evaluation With respect to gold standard Accuracy For each cluster, assign most common label to all items Rand index F-measure Alternatives: Extrinsic evaluation

Evaluation With respect to gold standard Accuracy For each cluster, assign most common label to all items Rand index F-measure Alternatives: Extrinsic evaluation Human inspection

Configuration Given Set of objects O = {o 1,o 2, ….o n }

Configuration Given Set of objects O = {o 1,o 2, ….o n } Partition X ={x 1,…,x r } Partition Y ={y 1,….y s }

Configuration Given Set of objects O = {o 1,o 2, ….o n } Partition X ={x 1,…,x r } Partition Y ={y 1,….y s } In same sets in XIn diff’t sets in X In same sets in Y a d In diff’t sets in Y c b

Rand Index Measure of cluster similarity (Rand, 1971) No agreement? In same sets in XIn diff’t sets in X In same sets in Y a d In diff’t sets in Y c b

Rand Index Measure of cluster similarity (Rand, 1971) No agreement? 0; Full agreement In same sets in XIn diff’t sets in X In same sets in Y a d In diff’t sets in Y c b

Rand Index Measure of cluster similarity (Rand, 1971) No agreement? 0; Full agreement? 1 In same sets in XIn diff’t sets in X In same sets in Y a d In diff’t sets in Y c b

Precision & Recall Assume X is the gold standard partition Assume Y is the system-generated partition

Precision & Recall Assume X is the gold standard partition Assume Y is the system-generated partition For each pair of items in a cluster in Y Correct if they appear together in a cluster in X

Precision & Recall Assume X is the gold standard partition Assume Y is the system-generated partition For each pair of items in a cluster in Y Correct if they appear together in a cluster in X Can compute P, R, and F-measure

HW #10 Due to F. Xia

HW #10 Unsupervised POS tagging: Word clustering by neighboring word cooccurrence Create feature vectors: Features: counts of adjacent word occurrence E.g., L=he:10 or R=run:3 Perform clustering: K-medoids algorithm ( with cosine similarity) Evaluate clusters: Cluster mapping + accuracy

Q1 create_vectors.* training_file word_file feat_file outfile Training file: one-sentence-per-line: w1 w2 w3 …wn word_file: List of words to cluster word freq feat_file: List of words to use as features feat freq outfile: One list per word in word_file Format: word L=he 10 L=she 5 ….. R=gone 2 R=run 3…

Features Features are of the form: (L|R)=xx freq where xx is a word in the feat_file, L, R: the position where the feature appeared freq: # of times word xx appeared in position in training file Suppose ‘New York’ appears 540 times in corpus York L=New 540 … R=New 0…

Vector File One line per word in word_file Lines should be ordered by word_file Features should be sorted alphabetically by feature name E.g. L=an 3 L=the 10 … R=aqua 1 R=house 5 Feature sorting aids cosine computation

Q2 k_medoids.* vector_file num_clusters sys_cluster_file vector_file: Created by Q1 num_clusters : number of clusters to create sys_cluster_file: output representing clustering of vectors medoid w1 w2 w3 …wn where medoid is the medoid representing the cluster w1…wn are the words in the cluster

Q2: K-Medoids Similarity measure: Cosine similarity Initial medoids: Medoid i is at instance: where N is # of words to cluster C is # of clusters

Mapping Sys to Gold: One-to-One Find highest number in matrix Remove corresponding row and column Repeat until all rows removed s1 => g2 10 s2 => g1 7 s3 => g3 6 acc= (10+7+6)/sum Due to F. Xia g1g2g3 s12109 s2742 s3096 s4503

Mapping Sys to Gold: One-to-One Find highest number in matrix Remove corresponding row and column Repeat until all rows removed s1 => g2 10 s2 => g1 7 s3 => g3 6 acc= (10+7+6)/sum Due to F. Xia g1g2g3 s12109 s2742 s3096 s4503

Mapping Sys to Gold: Many-to-One Find highest number in matrix Remove corresponding row (but not column) Repeat until all rows removed s1 => g2 10 s2 => g1 7 s3 => g3 9 s4 => g1 5 acc= ( )/sum Due to F. Xia g1g2g3 s12109 s2742 s3096 s4503

Q3: calculate_accuracy calculate_accuracy.* sys_clust gold_clust flag map_file acc_file sys_clust: output of Q2: m w1 w2 … gold_clust: similar format, gold standard flag: 0: one-to-one; 1:many-to-one map_file: mapping of sys to gold clusters sys_clust_num => gold_clust_num count acc_file: just overall accuracy

Experiments Compare different numbers of words and different feature representations Compare different mapping strategies for accuracy Tabulate results