Graph-based cluster labeling using Growing Hierarchal SOM Mahmoud Rafeek Alfarra College Of Science & Technology The second International.

Slides:



Advertisements
Similar presentations
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Visualization of AAG Paper Abstracts André Skupin Dept. of Geography University of New Orleans AAG Pittsburgh, April 5, 2000.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
A Hybrid Self-Organizing Neural Gas Network James Graham and Janusz Starzyk School of EECS, Ohio University Stocker Center, Athens, OH USA IEEE World.
Clustering Unsupervised learning Generating “classes”
Webpage Understanding: an Integrated Approach
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Community Architectures for Network Information Systems
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Functions. Warm Up Solve each equation. 1.2x – 6 = x = X + 29 = x – 5 – 4x = 17 x = 14 x = - 7 x = -15 x = 11.
Text Mining: Fast Phrase-based Text Indexing and Matching Khaled Hammouda, Ph.D. Student PAMI Research Group University of Waterloo Waterloo, Ontario,
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Clustering.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Clustering C.Watters CS6403.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : E.J. Palomo, J. North, D. Elizondo, R.M. Luque, T. Watson NN Application of growing.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
A search engine is a web site that collects and organizes content from all over the internet Search engines look through their own databases of.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
GraPhS! Line – Bar - Pie. What is a graph? A graph is a visual representation of information. Graphs have 5 main parts: L-abels U-nits S-cale T-itle I-nformation.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Community self-Organizing Map and its Application to Data Extraction Presenter: Chun-Ping Wu Authors:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Matching Geometric Models via Alignment Alignment is the most common paradigm for matching 3D models to either 2D or 3D data. The steps are: 1. hypothesize.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Plan for today Introduction Graph Matching Method Theme Recognition Comparison Conclusion.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Gedas Adomavicius Jesse Bockstedt
Prepared by: Mahmoud Rafeek Al-Farra
Presented By: Mahmoud Rafeek Alfarra
Lecture 22 Clustering (3).
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
كلية المجتمع الخرج البرمجة - المستوى الثاني
Presented by: Prof. Ali Jaoua
RIO: Relational Indexing for Object Recognition
Matching Words with Pictures
VOCABULARY! EXAMPLES! Relation: Domain: Range: Function:
Introduction To Programming Information Technology , 1’st Semester
Prepared by: Mahmoud Rafeek Al-Farra
Self-organizing map numeric vectors and sequence motifs
Objective- To graph a relationship in a table.
Relation (a set of ordered pairs)
Topic: Semantic Text Mining
Presentation transcript:

Graph-based cluster labeling using Growing Hierarchal SOM Mahmoud Rafeek Alfarra College Of Science & Technology The second International conference of Applied Science & natural Ayman Shehda Ghabayen College Of Science & Technology Prepared by:

Out Line  Labeling, What and why ?  Graph based Representation  Growing Hierarchal SOM  Extraction of labeles of clusters

Labeling, What and why ?  Cluster labeling: process tries to select descriptive labels (Key words) for the clusters obtained through a clustering algorithm.

Labeling, What and why ?  Cluster labeling is an increasingly important task that: 1.The document collections grow larger. 2.Help To: work with processing of news, threads, blogs, reviews, and search results

Labeling, What and why ? Documents collection Document Labeled Clusters Preprocessing Step DIG Model XB S O L A G C D Clustering Process + Labeling 0G00G0 0G10G1 0Gs0Gs SOM 1G01G0 1G11G1 1Gs1Gs 2G12G1 2G22G2 Hierarchal Growing SOM 2G12G1 2G22G2 1G01G0 1G11G1 2G12G1 2G22G2

Graph based Representation A B X D N C S 2,3 3,3 1,3 1,1 ph1 ph2 ph3 ph4 ph5

Graph based Representation  Capture the silent features of the data.  DIG Model: a directed graph. A document is represented as a vector of sentences Phrase indexing information is stored in the graph nodes themselves in the form of document tables. e1e1 e0e0 e2e2 rafting adventures river Document Table e 0 S 1 (1), S 2 (2), S 3 (1) e 0 S 2 (1) e 2 S 1 (2) e 1 S 4 (1) fishing DocTFET 1{0,0,3} 2{0,0,2} 3{0,0,1} S1(2) # Sentence Position of term

Graph based Representation Example Document 1 River rafting Mild river rafting River rafting trips Document 2 Wild river adventures River rafting vocation plan fishing trips fishing vocation plan booking fishing trips river fishing mild river rafting trips mild river rafting trips wild adventures vocation plan wild plan mild river rafting trips adventures vocation booking fishing +

Growing Hierarchal SOM

 Determining the winning node … v1v1 v2v2 v3v3 v5v5 v4v4 v7v7 e0e0 v6v6 e0e0 e1e1 e5e5 e3e3 e2e2 e4e4 n-nodes in SOM (Gs) v1v1 v2v2 v5v5 v7v7 e0e0 v6v6 e0e0 e1e1 e5e5 e3e3 Input Document Graph (Gi) Phrases Significance GiGs length Gi

Growing Hierarchal SOM  Neuron updating in the graph domain A BD C e0e0 X e0e0 e1e1 e5e5 e3e3 Y B D C E e4e4 e1e1 e5e5 e3e3 A e2e2 e2e2 G1 G2 We choose increasing the matching phrases to update graphs due to its affect is more stronger than increasing terms (nodes) also add matching phrases can consider it as add ordered pair of nodes

Over all Document clustering Process

Extracting labeling of clusters  To extract the Key word, we need to build a table for each cluster as the following: TermTF- Locations {T, L,B,b} No of matching phrases (MP) Weight Weight = (f1*T + f2*L + f3*B+ f4*b) * MP * 0.6

Extracting labeling of clusters T1 T2 T3 T10 T4 T7 T8 T11 T6 T5 T9 TermF-weight# MPNet weight T (T2,T3), (T2,T5) =6.16 T (T2,T3), (T5,T3) = 5.28 T (T2,T5), (T8, T5), (T5,T3) = 6.4 T (T8,T5) =6.36

Thank You … Questions