03.10.2007ADBIS 20071 A Method for Comparing Self- Organizing Maps: Case Studies of Banking and Linguistic Data Toomas Kirt, Ene Vainik, Leo Võhandu.

Slides:

Advertisements

Similar presentations

PowerPoint Tutorial 1 Creating a Presentation

Advertisements

Excel Tutorial 3 Working with Formulas and Functions

Objectives Create an action query to create a table

January 23, Quincy Wu Interconnecting VoIP on Taiwan Academic Network (TANet)

GL9 Antwerpen - 11 December Open access to full text and ETDs in Europe: improving accessibility through the choice of language? Christiane Stock.

GL9, Antwerp, Dec , Some Types of Grey Literature: a Polish Context Dr Marek Nahotko Jagiellonian University, Cracow, Poland GL 9.

The use of credits and fields of education in tertiary education - the Finnish method.

Micha LisaninSofia, Bulgaria, July Republic of Serbia Ministry of Labour and Social Policy Costing Methodology of Social Welfare in Child Protection.

Permit to Fly: EASA Procedures for Approval of Flight Conditions Caroline Ruga Certification Policy Officer.

LAW February A 5/3 approximation algorithm for a hard case of stable marriage Rob Irving Computing Science Department University of Glasgow (joint.

10IMSC August Tests for Evaluating Rank Histograms from Ensemble Forecasts Ian Jolliffe University of Exeter Cristina Primo.

Word Tutorial 1 Creating a Document

V3.0 12/04/ For Coral Systems With PRI Services Presented by Education Technology Services.

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics

Elliott / October Understanding the Construct to be Assessed Stephen N. Elliott, PhD Learning Science Institute & Dept. of Special Education Vanderbilt.

Objectives Explore a structured range of data Freeze rows and columns

March 8, Dynamic Fault Tree analysis using Input/Output Interactive Markov Chains Hichem Boudali, Pepijn Crouzen, and Mariëlle Stoelinga. Formal.

Naam van de Auteur 7 januari 2008 The implementation results of the E- portfolio NL specification H-P Köhler 23 Oktober 2008.

Sonar l TELE e-coaching application l de Overdracht bv © 2007 SONAR TELE e-coaching application.

Madrid 7 -9 March FARO EU Kick-off Meeting Introduction to the project by Marta Pérez-Soba.

M. F. Chiang, Z. Ghassemlooy, Wai Pang Ng,

NOC M. F. Chiang M. F. Chiang, Z. Ghassemlooy, Wai Pang Ng, and H. Le Minh Optical Communication Research Group Northumbria University, United Kingdom.

2007 OMRN ConferenceOttawa, October 24-27, Coastal and Ocean Management: Balancing Local and Large-Scale Roland Cormier, DFO-MPO, Gulf Region, Moncton.

Land Development Process 10/6/2014Carma Calgary Land

Word Tutorial 5 Working with Templates and Outlines

USSGL Board Meeting 4/26/20071 U.S. Treasury Zero Coupon Bonds (ZCB)

Helping Online Students Do Better Academically with Useful Learning & Study Strategies. Some recommended instructor activities to help students succeed.

Defining Affordability for Massachusetts Christine Barber Community Catalyst Connector Board Meeting April 2007 How can research inform the individual.

Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.

Kohonen Self Organising Maps Michael J. Watts

Artificial Neural Networks

Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.

Principal Component Analysis

UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

A scalable multilevel algorithm for community structure detection

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.

CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.

Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.

Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.

Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.

What is a neural network? Collection of interconnected neurons that compute and generate impulses. Components of a neural network include neurons, synapses,

A two-stage approach for multi- objective decision making with applications to system reliability optimization Zhaojun Li, Haitao Liao, David W. Coit Reliability.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.

Applying Neural Networks Michael J. Watts

SINGULAR VALUE DECOMPOSITION (SVD)

Multidimensional Scaling Vuokko Vuori Based on: Data Exploration Using Self-Organizing Maps, Samuel Kaski, Ph.D. Thesis, 1997 Multivariate Statistical.

Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.

TreeSOM :Cluster analysis in the self- organizing map Neural Networks 19 (2006) Special Issue Reporter 張欽隆 D

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.

Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.

ViSOM － A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.

Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.

Building Adaptive Basis Function with Continuous Self-Organizing Map

Data Mining, Neural Network and Genetic Programming

Principal Component Analysis (PCA)

Unsupervised Learning and Neural Networks

Self organizing networks

X.1 Principal component analysis

Principal Components Analysis

Comparative Evaluation of SOM-Ward Clustering and Decision Tree for Conducting Customer-Portfolio Analysis By 1Oloyede Ayodele, 2Ogunlana Deborah, 1Adeyemi.

Principal Component Analysis

Artificial Neural Networks

Presentation transcript:

ADBIS A Method for Comparing Self- Organizing Maps: Case Studies of Banking and Linguistic Data Toomas Kirt, Ene Vainik, Leo Võhandu

ADBIS Outline Self-Organizing Map Dimensionality Reduction Matrix Reordering Methodology of Similarity Measurement Case Studies of Banking and Linguistic Data

ADBIS Self-Organizing Map (SOM) The SOM is a feedforward artificial neural network that uses an unsupervised learning algorithm. It projects nonlinear relationships between high-dimensional input data into a two- dimensional output grid (map).

ADBIS Dimensionality Reduction To reduce dimensionality of the data we use the principal component analysis (PCA). The PCA transforms the data linearly and projects original data on a new set of variables that are called the principal components, while retaining as much as possible of the variation present in the data set. Those are uncorrelated and ordered so that the first few components represent most of the variation of the original variables.

ADBIS Matrix Reordering The matrix reordering is a structuring method for graphs (and general data tables). The method reorganizes the neighborhood graph data vertices according to specific property – systems monotonicity. The main goal of such matrix analysis is to permutate rows and columns to maximize the similarity of the neighbouring elements.

ADBIS Methodology of Similarity Measurement When we use different sources of data that describe the same phenomenon but are collected somehow differently or the number of variables is varying then we have to assess whether the results of the two analyses are similar. As the SOM projects close units of the input space into nearby map units the local neighborhood should remain quite similar. In this paper we propose a simple method to compare the results of different self-organizing maps that is based on the measurement of similarities of the local neighborhood.

ADBIS Methodology of Similarity Measurement I Firstly, to analyze general organization the resulting map is visually examined and clusters and their borders are identified, also the general orientation and locations of data items are identified. Thereafter the matrix of neighbourhood relations is formed.

ADBIS Methodology of Similarity Measurement II The SMC rates positive and negative similarity equally and can be used if positive and negative values have equal weight. Jaccard Coefficient (J) is used if the negative and positive matches have different weights and therefore ignores negative matches. If the value of the Jaccard coefficient and SMC is below 0.5 then the number of positive matches is less than half of the total matches.

ADBIS Methodology of Similarity Measurement III Fourth part of the similarity measurement consists of finding how much the two neighbouring matrixes are identical what is a maximum isomorphic subset. Here we can perform a new meta-level analysis and reorder and visualize the isomorphic sub-graph to see commonly shared information between two maps.

ADBIS Case Studies We use two sets of data to illustrate the method of similarity measurement. The first is a research into the concepts of emotion in Estonian language. The survey consisted of two parts and as a result two different data matrixes describe the same set of emotion concepts. We analyze whether and to what extent the results of two tasks are comparable. The second data set is banking data. In this case the purpose of our meta-analysis is to detect whether and to what degree the dimensionality reduction method (PCA) applied to the data has preserved its structure.

ADBIS Study of Estonian Concepts of Emotion The purpose of the study was to discover the hidden structure of the Estonian emotion concepts. Two lexical tasks were carried out providing information about emotion concepts either through their relation to the episodes of emotional experience or through semantic interrelations of emotion terms (synonymy and antonymy). There were 24 emotion concepts selected for the study. In the first task they had to evaluate the meaning of every single word against a set of seven bipolar scales. In the second task the same participants had to elicit emotion terms similar and opposite by meaning to the same 24 stimulus words.

ADBIS Study of Estonian Concepts of Emotion a) Results of the Task 1 (24 concepts evaluated on the seven bipolar scales), b) Results of the Task 2 (24 concepts arranged according to relations of synonymy and antonymy)

ADBIS Study of Estonian Concepts of Emotion Neighbourhood similarity of the SOM of conceptual data Neig. range Task 1 neig. Task 2 neig. Similar neig. Total neig. SMCJaccard coef

ADBIS Study of Estonian Concepts of Emotion Reordered and visualized isomorphic subgraph of lexical data (Task 1 vs. Task 2). Neighborhood range 2.

ADBIS Study of Banking Data The second data set is used to illustrate the impact of dimensionality reduction by PCA on the SOM maps. The aim of the study is to measure the similarity between the results of SOM mapping of original data and reduced data. The data set consists of banking data ( ; We have used 133 public quarterly reports by individual banks as a balance sheet and profit / loss statement (income statement). The 50 most important variables have been selected to form a short financial statement of a bank.

ADBIS Study of Banking Data a) SOM of 50 original variables, b) SOM of 26 principal components describing 95% of variation,

ADBIS Study of Banking Data Neighbourhood similarity of the SOM of banking data ExperimentNeig. range Orig neig. PCA neig. Similar neig. Total neig. SMCJaccard coef. Orig vs. PCA 95% Orig vs. PCA 50% Orig vs. PCA 95% Orig vs. PCA 50%

ADBIS Study of Banking Data Reordered and visualized isomorphic subgraph of banking data (Original vs. PCA 95% variation). Neighbourhood range 1.

ADBIS Conclusions The case studies give an overview of possibilities to measure similarity between self-organizing maps that is based on the topology and neighbourhood relations. We find local neighbourhood relations between the data items and measure the similarity of relations by coefficients and by finding an isomorphic subgraph. The maximum isomorphic subgraph as a measure to identify the similarity between the SOMs, but it became a new meta-level tool.

ADBIS Thank you!