Download presentation
Presentation is loading. Please wait.
1
NJVR: The NanJing Vocabulary Repository
Gong Cheng, Min Liu, Yuzhong Qu Submitted to ISWC 2012 (Evaluations and Experiments)
2
Categories of accepted papers
Experimental studies comparing a spectrum of approaches to a particular problem and, through extensive experiments, providing a comprehensive perspective on the underlying phenomena or approaches. Analyses of experimental results providing insights on the nature or characteristics of studied phenomena, including negative results. Result verification focusing on verifying or refuting published results and, through the renewed analysis, help to advance the state of the art. Benchmarking, focusing on datasets and algorithms for comprehensible and systematic evaluation of existing and future systems.
3
Categories of accepted papers
Experimental studies comparing a spectrum of approaches to a particular problem and, through extensive experiments, providing a comprehensive perspective on the underlying phenomena or approaches. Analyses of experimental results providing insights on the nature or characteristics of studied phenomena, including negative results. Result verification focusing on verifying or refuting published results and, through the renewed analysis, help to advance the state of the art. Benchmarking, focusing on datasets and algorithms for comprehensible and systematic evaluation of existing and future systems.
4
Outline WHY to publish NJVR? HOW to construct NJVR?
WHAT to constitute NJVR? WHERE to use NJVR?
5
Motivation summarization ranking matching Vocabulary-oriented problems
Real-world vocabularies
6
Existing vocabulary repositories
Manually submitted Automatically crawled Size: hundreds Access: browsing Size: thousands Access: via searching
7
Main contribution NanJing Vocabulary Repository (NJVR)
Source: Falcons (crawled from the real Semantic Web) Size 2,996 vocabularies, from 261 PLDs their instantiations in 4.1 billion RDF triples, from 5,805 PLDs Access: downloadable
8
Crawling Initialization of the URI pool Running from 2007 to May 2011
Downloaded from other repositories (e.g. pingthesemanticweb.com, schemaweb.info) Samples and entry points of LOD Retrieved from other search engines (e.g. Swoogle, Google) Running from 2007 to May 2011
9
Constitution and statistical analysis
Vocabulary description Vocabulary instantiation
10
Vocabulary description
455,718 terms with authoritative description documents 396,023 classes, 59,868 properties (overlap: 173) 2,996 vocabularies, from 261 PLDs Great variety
11
Vocabulary instantiation
4.1 billion RDF triples in 15.9 million RDF documents, from 5,805 PLDs BTC 2011: 2.1 billion RDF triples, from 791 PLDs
12
Vocabulary instantiation (cont.)
Instantiations of 115,707 classes, 25,963 properties (1,874 vocabularies)
13
Experiments Vocabulary ranking Vocabulary matching Vocabulary mining
14
Vocabulary ranking Vocabulary reference graph (excluding RDF, RDFS and OWL) Measures of centrality Indegree Eigenvector, PageRank (with a damping factor of 0.85), HITS authority Betweenness Closeness
15
Vocabulary ranking (cont.)
16
Vocabulary matching Vocabulary similarity: adding up (in a sophisticated way) the lexical similarities between their constituent terms Matchable vocabularies
17
Vocabulary mining Association rule mining Sampling Results
Item: vocabulary Transaction: the set of vocabularies instantiated in an RDF document Rule: {vi, vj, …} {vm, vn, …} Sampling 10,000 RDF documents ≤50 from any single PLD Excluding RDF, RDFS and OWL Results 19 rules under support=0.05, confidence=0.80 207 rules under support=0.01, confidence=0.80
18
Conclusions 1 vocabulary repository 3 sets of experiments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.