Community Architectures for Network Information Systems

Slides:



Advertisements
Similar presentations
Graph-based cluster labeling using Growing Hierarchal SOM Mahmoud Rafeek Alfarra College Of Science & Technology The second International.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Self Organization of a Massive Document Collection
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Kohonen Self Organising Maps Michael J. Watts
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
X0 xn w0 wn o Threshold units SOM.
Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Self-Organizing Hierarchical Neural Network
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Neural Networks Chapter Feed-Forward Neural Networks.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
A Hybrid Self-Organizing Neural Gas Network James Graham and Janusz Starzyk School of EECS, Ohio University Stocker Center, Athens, OH USA IEEE World.
Lecture 09 Clustering-based Learning
Radial Basis Function (RBF) Networks
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Lecture 12 Self-organizing maps of Kohonen RBF-networks
KOHONEN SELF ORGANISING MAP SEMINAR BY M.V.MAHENDRAN., Reg no: III SEM, M.E., Control And Instrumentation Engg.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Self Organized Map (SOM)
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Self-organizing Maps Kevin Pang. Goal Research SOMs Research SOMs Create an introductory tutorial on the algorithm Create an introductory tutorial on.
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Kohonen Mapping and Text Semantics Xia Lin College of Information Science and Technology Drexel University.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Self Organizing Feature Map CS570 인공지능 이대성 Computer Science KAIST.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
TreeSOM :Cluster analysis in the self- organizing map Neural Networks 19 (2006) Special Issue Reporter 張欽隆 D
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
CUNY Graduate Center December 15 Erdal Kose. Outlines Define SOMs Application Areas Structure Of SOMs (Basic Algorithm) Learning Algorithm Simulation.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
Self-Organizing Maps (SOM) (§ 5.5)
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Self-organizing maps applied to information retrieval of dissertations and theses from BDTD-UFPE Bruno Pinheiro Renato Correa
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Computational Intelligence: Methods and Applications Lecture 9 Self-Organized Mappings Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Naifan Zhuang, Jun Ye, Kien A. Hua
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Data Mining, Neural Network and Genetic Programming
Other Applications of Energy Minimzation
Unsupervised Learning and Autoencoders
Lecture 22 Clustering (3).
Nearest Neighbors CSC 576: Data Mining.
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Feature mapping: Self-organizing Maps
Artificial Neural Networks
Unsupervised Networks Closely related to clustering
A Neural Net For Terrain Classification
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Community Architectures for Network Information Systems Visualization and Navigation of Document Information Spaces Using a Self-Organizing Map Daniel X. Pape Community Architectures for Network Information Systems dpape@canis.uiuc.edu www.canis.uiuc.edu CSNA’98 6/18/98

Overview Self-Organizing Map (SOM) Algorithm U-Matrix Algorithm for SOM Visualization SOM Navigation Application Document Representation and Collection Examples Problems and Optimizations Future Work

Basic SOM Algorithm Input Number (n) of Feature Vectors (x) format: vector name: a, b, c, d examples: 1: 0.1, 0.2, 0.3, 0.4 2: 0.2, 0.3, 0.3, 0.2

Basic SOM Algorithm Output Neural network Map of (M) Nodes Each node has an associated Weight Vector (m) of the same dimensionality as the input feature vectors Examples: m1: 0.1, 0.2, 0.3, 0.4 m2: 0.2, 0.3, 0.3, 0.2

Basic SOM Algorithm Output (cont.) Nodes laid out in a grid:

Basic SOM Algorithm Other Parameters Number of timesteps (T) Learning Rate (eta)

Basic SOM Algorithm SOM() { foreach timestep t { foreach feature vector fv { wnode = find_winning_node(fv) update_local_neighborhood(wnode) } find_winning_node() { foreach node n { compute distance of m to feature vector } return node with the smallest distance update_local_neighborhood(wnode) { foreach node n { m = m + eta [x - m] }

U-Matrix Visualization Provides a simple way to visualize cluster boundaries on the map Simple algorithm: for each node in the map, compute the average of the distances between its weight vector and those of its immediate neighbors Average distance is a measure of a node’s similarity between it and its neighbors

U-Matrix Visualization Interpretation one can encode the U-Matrix measurements as greyscale values in an image, or as altitudes on a terrain landscape that represents the document space: the valleys, or dark areas are the clusters of data, and the mountains, or light areas are the boundaries between the clusters

U-Matrix Visualization Example: dataset of random three dimensional points, arranged in four obvious clusters

U-Matrix Visualization Four (color-coded) clusters of three-dimensional points

U-Matrix Visualization Oblique projection of a terrain derived from the U-Matrix

U-Matrix Visualization Terrain for a real document collection

Current Labeling Procedure Feature vectors are encoded as 0’s and 1’s Weight vectors have real values from 0 to 1 Sort weight vector dimensions by element value dimension with greatest value is “best” noun phrase for that node Aggregate nodes with the same “best” noun phrase into groups

Umatrix Navigation 3D Space-Flight Hierarchical Navigation

Document Data Noun phrases extracted Set of unique noun phrases computed each noun phrase becomes a dimension of the data set Each document represented by a binary vector with a 1 or a 0 denoting the existence or absence of each noun phrase

Document Data Example: 10 total noun phrases: alexander, king, macedonians, darius, philip, horse, soldiers, battle, army, death each element of the feature vector will be a 1 or a 0: 1: 1, 1, 0, 0, 1, 1, 0, 0, 0, 0 2: 0, 1, 0, 1, 0, 0, 1, 1, 1, 1

Document Collection Examples

Problems As document sets get larger, the feature vectors get longer, use more memory, etc. Execution time grows to unrealistic lengths

Solutions? Need algorithm refinements for sparse feature vectors Need a faster way to do the find_winning_node() computation Need a better way to do the update_local_neighborhood() computation

Sparse Vector Optimization Intelligent support for sparse feature vectors saves on memory usage greatly improves speed of the weight vector update computation

Faster find_winning_node() SOM weight vectors become partially ordered very quickly

Faster find_winning_node() U-Matrix Visualization of an Initial, Unordered SOM

Faster find_winning_node() Partially Ordered SOM after 5 timesteps

Faster find_winning_node() Don’t do a global search for the winner Start search from last known winner position Pro: usually finds a new winner very quickly Con: this new search for a winner can sometimes get stuck in a local minima

Better Neighborhood Update Nodes get told to “update” quite often Weight vector is made public only during a find_winner() search With local find_winning_node() search, a lazy neighborhood weight vector update can be performed

Better Neighborhood Update Cache update requests each node will store the winning node and feature vector for each update request The node performs the update computations called for by the stored update requests only when asked for its weight vector Possible reduction of number of requests by averaging the feature vectors in the cache

New Execution Times

Future Work Parallelization Label Problem

Label Problem Current Procedure not very good Cluster boundaries Term selection

Cluster Boundaries Image processing Geometric

Cluster Boundaries Image processing example:

Term Selection Too many unique noun phrases “Knee” of frequency curve Too many dimensions in the feature vector data “Knee” of frequency curve