NJVR: The NanJing Vocabulary Repository

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute 1 From OntoSelect to OntoSelect-SWSE.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
1 Technical Developments Related to Quality Issues Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY
The Web of data with meaning... By Michael Griffiths.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Ontology Summarization Based on RDF Sentence Graph Written by: Xiang Zhang, Gong Cheng, Yuzhong Qu Presented by: Sophya Kheim.
Problem Addressed Attempts to prove that Web Crawl is random & biased image of Web Graph and does not assert properties of Web Graph Understanding the.
LINKED DATA COMS E6125 Prof. Gail Kaiser Presented By : Mandar Mohe ( msm2181 )
Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
Presented by: Ashgan Fararooy Referenced Papers and Related Work on:
Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Wednesday 14 th of June, 2006.
Aligner automatiquement des ontologies avec Tuesday 23 rd of January, 2007 Rapha ë l Troncy.
ISWC2007, Nov. 14. Discovering simple mappings between Relational database schemas and ontologies Wei Hu, Yuzhong Qu {whu,
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.
Aidan Hogan, Antoine Zimmermann, Jürgen Umbrich, Axel Polleres, Stefan Decker Presented by Joseph Park SCALABLE AND DISTRIBUTED METHODS FOR ENTITY MATCHING,
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES Debashis Naskar 1 and Biswanath Dutta 2 DSIC, Universitat Politècnica de València.
Finding Replicated web collections
CPS : Information Management and Mining
Academic Visualization an insight into academia
Linked Data Web that can be processed by machines
Information Organization: Overview
DATA MINING Introductory and Advanced Topics Part III – Web Mining
Cloud based linked data platform for Structural Engineering Experiment
Map Reduce.
An Empirical Study of Learning to Rank for Entity Search
Saisai Gong, Wei Hu, Yuzhong Qu
Ontology Evaluation ارزیابی آنتولوژی
Presented by ebiqity UMBC Nov, 2004
Ontology Partition for Browsing
Thanks to Bill Arms, Marti Hearst
A Schema and Instance Based RDF Dataset Summarization Tool
Gong Cheng, Yanan Zhang, and Yuzhong Qu
Weiyi Ge, Gong Cheng, Huiying Li, Yuzhong Qu
Property consolidation for entity browsing
Data Mining Chapter 6 Search Engines
Discriminative Frequent Pattern Analysis for Effective Classification
An Interactive Approach to Collectively Resolving URI Coreference
A Snapshot of the OWL Web
LOD reference architecture
Panagiotis G. Ipeirotis Luis Gravano
Danyun Xu, Gong Cheng*, Yuzhong Qu
Filtering Properties of Entities By Class
Information Organization: Overview
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

NJVR: The NanJing Vocabulary Repository Gong Cheng, Min Liu, Yuzhong Qu Submitted to ISWC 2012 (Evaluations and Experiments)

Categories of accepted papers Experimental studies comparing a spectrum of approaches to a particular problem and, through extensive experiments, providing a comprehensive perspective on the underlying phenomena or approaches. Analyses of experimental results providing insights on the nature or characteristics of studied phenomena, including negative results. Result verification focusing on verifying or refuting published results and, through the renewed analysis, help to advance the state of the art. Benchmarking, focusing on datasets and algorithms for comprehensible and systematic evaluation of existing and future systems.

Categories of accepted papers Experimental studies comparing a spectrum of approaches to a particular problem and, through extensive experiments, providing a comprehensive perspective on the underlying phenomena or approaches. Analyses of experimental results providing insights on the nature or characteristics of studied phenomena, including negative results. Result verification focusing on verifying or refuting published results and, through the renewed analysis, help to advance the state of the art. Benchmarking, focusing on datasets and algorithms for comprehensible and systematic evaluation of existing and future systems.

Outline WHY to publish NJVR? HOW to construct NJVR? WHAT to constitute NJVR? WHERE to use NJVR?

Motivation summarization ranking matching Vocabulary-oriented problems Real-world vocabularies

Existing vocabulary repositories Manually submitted Automatically crawled Size: hundreds Access: browsing Size: thousands Access: via searching

Main contribution NanJing Vocabulary Repository (NJVR) Source: Falcons (crawled from the real Semantic Web) Size 2,996 vocabularies, from 261 PLDs their instantiations in 4.1 billion RDF triples, from 5,805 PLDs Access: downloadable

Crawling Initialization of the URI pool Running from 2007 to May 2011 Downloaded from other repositories (e.g. pingthesemanticweb.com, schemaweb.info) Samples and entry points of LOD Retrieved from other search engines (e.g. Swoogle, Google) Running from 2007 to May 2011

Constitution and statistical analysis Vocabulary description Vocabulary instantiation

Vocabulary description 455,718 terms with authoritative description documents 396,023 classes, 59,868 properties (overlap: 173) 2,996 vocabularies, from 261 PLDs Great variety

Vocabulary instantiation 4.1 billion RDF triples in 15.9 million RDF documents, from 5,805 PLDs BTC 2011: 2.1 billion RDF triples, from 791 PLDs

Vocabulary instantiation (cont.) Instantiations of 115,707 classes, 25,963 properties (1,874 vocabularies)

Experiments Vocabulary ranking Vocabulary matching Vocabulary mining

Vocabulary ranking Vocabulary reference graph (excluding RDF, RDFS and OWL) Measures of centrality Indegree Eigenvector, PageRank (with a damping factor of 0.85), HITS authority Betweenness Closeness

Vocabulary ranking (cont.)

Vocabulary matching Vocabulary similarity: adding up (in a sophisticated way) the lexical similarities between their constituent terms Matchable vocabularies

Vocabulary mining Association rule mining Sampling Results Item: vocabulary Transaction: the set of vocabularies instantiated in an RDF document Rule: {vi, vj, …}  {vm, vn, …} Sampling 10,000 RDF documents ≤50 from any single PLD Excluding RDF, RDFS and OWL Results 19 rules under support=0.05, confidence=0.80 207 rules under support=0.01, confidence=0.80

Conclusions 1 vocabulary repository 3 sets of experiments