Text Based Similarity Metrics and Delta for Semantic Web Graphs

Slides:



Advertisements
Similar presentations
Advanced XSLT. Branching in XSLT XSLT is functional programming –The program evaluates a function –The function transforms one structure into another.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.
Ensembles in Adversarial Classification for Spam Deepak Chinavle, Pranam Kolari, Tim Oates and Tim Finin University of Maryland, Baltimore County Full.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF.
Near-duplicates detection Comparison of the two algorithms seen in class Romain Colle.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
Deduplication CSCI 572: Information Retrieval and Search Engines Summer 2010.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
Chapter 4 Pattern Recognition Concepts continued.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
12th of October, 2006KEG seminar1 Combining Ontology Mapping Methods Using Bayesian Networks Ontology Alignment Evaluation Initiative 'Conference'
Logics for Data and Knowledge Representation
2014-May-07. What is the problem? What have others done? What is our solution? Does it work? Outline 2.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)
Populating A Knowledge Base From Text Clay Fink, Tim Finin, Christine Piatko and Jim Mayfield.
Information Interchange on the Semantic Web an interactive talk by Piotr Kaminski, University of Victoria
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Semantic Web Exam 1 Review.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Aligner automatiquement des ontologies avec Tuesday 23 rd of January, 2007 Rapha ë l Troncy.
Using linked data to interpret tables Varish Mulwad September 14,
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
CityStateMayorPopulation BaltimoreMDS.C.Rawlings-Blake637,418 SeattleWAM.McGinn617,334 BostonMAT.Menino645,169 RaleighNCC.Meeker405,791 We are laying a.
Class Imbalance in Text Classification
Text Based Similarity Metrics and Delta for Semantic Web Graphs Krishnamurthy Koduvayur Viswanathan Monday, June 28,
UMBC an Honors University in Maryland 1 Finding and Ranking Knowledge on the Semantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam.
Learning Co-reference Relations for FOAF Instances Jennifer Sleeman and Tim Finin, University of Maryland, Baltimore County Motivation Establishing co-reference.
UMBC an Honors University in Maryland 1 Searching for Knowledge and Data on the Semantic Web Tim Finin University of Maryland, Baltimore County
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Chapter 1 Review - Get a whiteboard and marker per pair - Take out a blank sheet of paper.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Solving Inequalities Using Addition or Subtraction Honors Math – Grade 8.
Semantic Web In Depth Resource Description Framework Dr Nicholas Gibbins –
Linked Data Profiling Andrejs Abele UNLP PhD Day Supervisor: Paul Buitelaar.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Text Classification and Naïve Bayes Text Classification: Evaluation.
IR Homework #2 By J. H. Wang May 9, Programming Exercise #2: Text Classification Goal: to classify each document into predefined categories Input:
Linked Data Web that can be processed by machines
Semantic Processing with Context Analysis
Introduction to the Semantic Web (tutorial) 2009 Semantic Technology Conference San Jose, California, USA June 15, 2009 Ivan Herman, W3C
Presented by ebiqity UMBC Nov, 2004
UMBC AN HONORS UNIVERSITY IN MARYLAND
Text Categorization Assigning documents to a fixed set of categories
iSRD Spam Review Detection with Imbalanced Data Distributions
[jws13] Evaluation of instance matching tools: The experience of OAEI
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Family History Technology Workshop
Warm Up Solve. 1. 2x + 9x – 3x + 8 = –4 = 6x + 22 – 4x 3. + = 5
Leverage Consensus Partition for Domain-Specific Entity Coreference
Objective- To use an equation to graph the
Objective- To graph a relationship in a table.
By Hossein Hematialam and Wlodek Zadrozny Presented by
More XML XML schema, XPATH, XSLT
Introduction to Sentiment Analysis
OntoRank for RDF documents
Presentation transcript:

Text Based Similarity Metrics and Delta for Semantic Web Graphs Krishnamurthy Viswanathan and Tim Finin, University of Maryland, Baltimore County Motivation Case 3: Different versions of the same SW graph In addition, when this case is detected, generate a delta between the two versions Classification Text similarity is very useful in information retrie-val for near duplicate and similarity detection Similarity metrics computed for each candidate pair Approach Naïve Bayes/SVM classifier: Difference only in Base-URI Naïve Bayes Classifier: Similarity in classes and properties SVM Classifier: Versioning Relationship Input: corpus of SWDs Convert to canonical form Convert to n-triples format Problem Identify pairs of similar documents Compute Text-Based Similarity Metrics Create Reduced Forms Generating Deltas Given a collection of SW graphs as RDF doc-uments, identify pairs of graphs that are similar Generate a delta for pairs of graphs identified as having a versioning relationship Version1 Except Version2 Subtractive Delta Version2 Except Version1 Additive Delta Delta Generate delta between versions Identify ontology versions Contributions Defined text-based similarity metrics char-acterizing relations between SW graphs Evaluated these metrics for three specific cases of similarity SW Graph Canonicalization <person:John> <a:livesIn> _:x . _:x <a:IsPartOf> ”USA” . <person:John> <a:likes> ”cheese” . _:x <a:hasCapital> :y . “~” <a:hasCapital> “~” . # _:x _:y “~” <a:IsPartOf> ”USA” . # _:x <person:John> <a:likes> ”cheese” . <person:John> <a:livesIn> “~” . #_:x Evaluation Case 1: Same classes and properties used but differ only in literal content Three datasets of 400+ semantic web documents for training and testing 17 combinations of similarity metrics tested: Jaccard, Containment, Cosine similarity, Hamming distance between Simhash fingerprints BNode Table _:g2 <a:hasCapital> _:g1 . _:g2 <a:IsPartOf> ”USA” . <person:John> <a:likes> ”cheese” . <person:John> <a:livesIn> _:g2 . Old bnode identifier New bnode identifier _:y _:g1 _:x _:g2 Assigns uniform identifiers to blank nodes Provides a deterministic order to statements Empirical method that works for most examples Type of Similarity True Positives False Positives Precision Recall Similarity in classes & properties 0.986 0.014 0.987 Difference only in base URI 0.988 0.012 Versioning Relationship 0.909 0.091 0.913 Four reduced forms Case 2: Differ only in base-URI Only literals from the original n-triple file All non-literal content from original n-triple file Base-URI of every node replaced by “” Literals and base-URIs replaced by “” UMBC AN HONORS UNIVERSITY IN MARYLAND