Name : Emad Zargoun Id number :135042 EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.

Slides:

Advertisements

Similar presentations

Chapter 5: Introduction to Information Retrieval

Advertisements

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.

Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.

April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:

Dynamic Ontologies on the Web Jeff Heflin, James Hendler.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.

1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.

Yimam & Kobsa July 13, 2000TWIST 2000 Centralization vs. Decentralization Issues in Internet-based KMS: Experiences from Expertise Recommender Systems.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Overview of Search Engines

1 Semantic Web Mining Presented by: Chittampally Vasanth Raja 10IT05F M.Tech (Information Technology)

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

The Exchange of Retrieval Knowledge about Services between Agents Mirjam Minor Mike Wernicke.

RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.

Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.

Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.

Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)

Agent Model for Interaction with Semantic Web Services Ivo Mihailovic.

Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s.

A Language Independent Method for Question Classification COLING 2004.

XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.

1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.

BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.

1 FollowMyLink Individual APT Presentation Third Talk February 2006.

Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI

Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.

David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Presented By- Shahina Ferdous, Student ID – , Spring 2010.

Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.

Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

.NET Mobile Application Development XML Web Services.

Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.

OWL Web Ontology Language Summary IHan HSIAO (Sharon)

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.

Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.

SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.

2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.

Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.

Data mining in web applications

Information Organization: Overview

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

CCNT Lab of Zhejiang University

Presented by: Hassan Sayyadi

Presented by: Prof. Ali Jaoua

Ying Dai Faculty of software and information science,

Information Organization: Overview

Introduction to Search Engines

Presentation transcript:

Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler

My outline 1. Overview about clustering 2. Web service clustering 3. Ontologies improve text document clustering 4. Heterarchy and Core Ontology 5. Compiling Background Knowledge into the Text Document Representation 6. Conclusion

Overview about clustering Definition of clustering  We have some definitions that we can define clustering in text mining  procedure of dividing texts into several clusters, where each cluster contains relevant text and each cluster differs from others  A grouping of data objects such that the objects within a group are similar, to one another and different from the objects in other groups.

Web service clustering  web services are distributed autonomous software components that are self-describing and designed by different vendors to provide business functions to other applications through an internet connection.  some major providers have even decided to advertise their services through their human- readable websites,. For example, Google’s and Amazon’s web services

Web service clustering  The mechanism for clustering web services to bootstrap is a service search engine.  That in web services we use web service description files (WSDL)files

Web service clustering  The clustering of web service files is different from the traditional web service discovery problem because there are no queries to match against.  the idea of representing a web service using document vectors is still relevant.  gathering the features for a WSDL file is not as simple as collecting description documents when assuming no central UDDI registries.

Web service clustering  system that can automatically cluster a group of WSDL files obtained by querying a search engine (e.g., Google).  process of mining four types of features of a WSDL file. 1. the content of the web service is characterized by the application-specific terms located in the WSDL file

Web service clustering 2. the context of the web service is represented by the application-specific terms appearing in all index web pages of publicly accessible parent directories of the current directory containing the WSDL file. 3. the service host is the second- and top-level portion of the domain name (i.e., a segment of the authority part of the URI) of the host containing the WSDL file. 4. the service name is the name of the WSDL file.

Web service clustering

 From the previous figure  word analyzer begins by tokenizing the WSDL or HTML files to construct the initial sets of C and X,  remove non-words from these sets  words in the two sets are conflated, and analyzed for their content-bearing property to remove function words  The remaining content words in the two sets are then clustered to identify application-specific terms and general computing terms.  we utilize regular expressions to extract the service name, s name and the service host address, short. These steps are implemented as modules identifyServiceHost and identifyServiceName as an example, the service name of this WSDL file

Web service clustering

The web service clusters produced based on the four types of features.

Ontologies improve text document clustering  The beneficial effects can be achieved for text document clustering by integrating an explicit conceptual account of terms found in ontologies like WordNet.  The clustering is then performed with Bi-Section- KMeans, which has been shown to perform as good as other text clustering algorithms.

Heterarchy and Core Ontology  Definition 1 (Core Ontology) A core ontology is a sign system which consists of :  A lexicon: The lexicon L contains a set of natural language terms.  A set of concepts C*.  The reference function  A heterarchy H: Concepts are taxonomically related by the directed, acyclic, transitive, reflexive relation.

 Example lexicon L = {Hotel, Grand Hotel, Hotel Schwarzer Adler, Accommodation,...)  concepts C* = {ROOT, HOTEL, ACCOMMODATION, …}  reference function F = {(Hotel, HOTEL), (Grand Hotel, HOTEL), (Hotel Schwarzer Adler, HOTEL), …}, i.e. "Hotel", "Grand Hotel" and "Hotel Schwarzer Adler" refer to the concept HOTEL.  heterarchy H = {(HOTEL, ACCOMMODATION), (ACCOMMODATION, ROOT), …}

Compiling Background Knowledge into the Text Document Representation we have three strategies that we can compile the text document Term vs. Concepts Vector Strategies  Enriching the term vectors with concepts from the core ontology has two benefits.  First it resolves synonyms; and  second it introduces more general concepts which help identifying related topics.  For instance, a document about beef may not be related to a document.

Compiling Background Knowledge into the Text Document Representation  Strategies for Disambiguation  The assignment of terms to concepts in Word net is ambiguous.  adding or replacing terms by concepts may add noise to the representation and may induce a loss of information.  We have 3 strategies in the disambiguation

Compiling Background Knowledge into the Text Document Representation  All Concepts (“all”). The baseline strategy is not to do anything about disambiguation and consider all concepts for augmenting the text document representation.  First Concept (“first”). Wordnet returns an ordered list of concepts when applying Ref C to a set of terms. Thereby, the ordering is supposed to reflect how common it is that term reflects a concept in “standard” English language. m ore common term meanings are listed before less common ones.

Compiling Background Knowledge into the Text Document Representation  Disambiguation by Context (“context”). The sense of a term t that refers to several different concepts Ref C(t) := {b, c,...} may be disambiguated by a simplified version of first strategy

Compiling Background Knowledge into the Text Document Representation  Strategies for considering the concept hierarchy  The third set of strategies varies the amount of background knowledge.  principal idea is that if a term like ‘beef’ appears, one does not only represent the document by the concept corresponding to ‘beef’, but also by the concepts corresponding to ‘meat’ and ‘food’ etc. up to a certain level of generality.

Conclusion  Clustering web services into functional similar groups can greatly reduce the search space of a service discovery task. Therefore, it can be seen as a predecessor of web service discovery or an important functionality provided by future service search engines.  Clustering based on all three document vectors (word vector, concept vector, category vector) also gets significantly better results than the baseline, but does not outperform clustering based only on word vector and category vector.

References  Web service clustering using text mining techniques(Int. J. Agent- Oriented Software Engineering, Vol. X, No. Y,, Wei Liu* and Wilson Wong )  Ontologies Improve Text Document Clustering(Andreas Hotho, Steffen Staab, Gerd Stumme,Institute AIFB, University of Karlsruhe,germany)  E. Agirre and G. Rigau. Word sense disambiguation using conceptual density. In Proc. of COLING’96, 1996.

Thanks for your attention