Presentation is loading. Please wait.

Presentation is loading. Please wait.

NJVR: The NanJing Vocabulary Repository

Similar presentations


Presentation on theme: "NJVR: The NanJing Vocabulary Repository"— Presentation transcript:

1 NJVR: The NanJing Vocabulary Repository
Gong Cheng, Min Liu, Yuzhong Qu Submitted to ISWC 2012 (Evaluations and Experiments)

2 Categories of accepted papers
Experimental studies comparing a spectrum of approaches to a particular problem and, through extensive experiments, providing a comprehensive perspective on the underlying phenomena or approaches. Analyses of experimental results providing insights on the nature or characteristics of studied phenomena, including negative results. Result verification focusing on verifying or refuting published results and, through the renewed analysis, help to advance the state of the art. Benchmarking, focusing on datasets and algorithms for comprehensible and systematic evaluation of existing and future systems.

3 Categories of accepted papers
Experimental studies comparing a spectrum of approaches to a particular problem and, through extensive experiments, providing a comprehensive perspective on the underlying phenomena or approaches. Analyses of experimental results providing insights on the nature or characteristics of studied phenomena, including negative results. Result verification focusing on verifying or refuting published results and, through the renewed analysis, help to advance the state of the art. Benchmarking, focusing on datasets and algorithms for comprehensible and systematic evaluation of existing and future systems.

4 Outline WHY to publish NJVR? HOW to construct NJVR?
WHAT to constitute NJVR? WHERE to use NJVR?

5 Motivation summarization ranking matching Vocabulary-oriented problems
Real-world vocabularies

6 Existing vocabulary repositories
Manually submitted Automatically crawled Size: hundreds Access: browsing Size: thousands Access: via searching

7 Main contribution NanJing Vocabulary Repository (NJVR)
Source: Falcons (crawled from the real Semantic Web) Size 2,996 vocabularies, from 261 PLDs their instantiations in 4.1 billion RDF triples, from 5,805 PLDs Access: downloadable

8 Crawling Initialization of the URI pool Running from 2007 to May 2011
Downloaded from other repositories (e.g. pingthesemanticweb.com, schemaweb.info) Samples and entry points of LOD Retrieved from other search engines (e.g. Swoogle, Google) Running from 2007 to May 2011

9 Constitution and statistical analysis
Vocabulary description Vocabulary instantiation

10 Vocabulary description
455,718 terms with authoritative description documents 396,023 classes, 59,868 properties (overlap: 173) 2,996 vocabularies, from 261 PLDs Great variety

11 Vocabulary instantiation
4.1 billion RDF triples in 15.9 million RDF documents, from 5,805 PLDs BTC 2011: 2.1 billion RDF triples, from 791 PLDs

12 Vocabulary instantiation (cont.)
Instantiations of 115,707 classes, 25,963 properties (1,874 vocabularies)

13 Experiments Vocabulary ranking Vocabulary matching Vocabulary mining

14 Vocabulary ranking Vocabulary reference graph (excluding RDF, RDFS and OWL) Measures of centrality Indegree Eigenvector, PageRank (with a damping factor of 0.85), HITS authority Betweenness Closeness

15 Vocabulary ranking (cont.)

16 Vocabulary matching Vocabulary similarity: adding up (in a sophisticated way) the lexical similarities between their constituent terms Matchable vocabularies

17 Vocabulary mining Association rule mining Sampling Results
Item: vocabulary Transaction: the set of vocabularies instantiated in an RDF document Rule: {vi, vj, …}  {vm, vn, …} Sampling 10,000 RDF documents ≤50 from any single PLD Excluding RDF, RDFS and OWL Results 19 rules under support=0.05, confidence=0.80 207 rules under support=0.01, confidence=0.80

18 Conclusions 1 vocabulary repository 3 sets of experiments


Download ppt "NJVR: The NanJing Vocabulary Repository"

Similar presentations


Ads by Google