Presentation is loading. Please wait.

Presentation is loading. Please wait.

RDF graph summaries 金成 2014/11/3.

Similar presentations


Presentation on theme: "RDF graph summaries 金成 2014/11/3."— Presentation transcript:

1 RDF graph summaries 金成 2014/11/3

2 Graph Summary A graph summary captures the information that represents the original data graph. Most of summaries are the substitution of the data graph with a homomorphic graph, which contains ideally less nodes and edges with regards to the data graph. Each approach produces different graph summary. [Picture from Campinas S, Perry T E, Ceccarelli D, et al. Introducing rdf graph summary with application to assisted sparql formulation[C]]

3 Pattern extraction

4 RDF graph sample db:player/1 eg:country “Argentina”;
eg:birthday “1987/6/24”; rdf:type “player” eg:friends “Beckham” eg:teammate “neymar” eg:instrestedIn “the roling stones” db:player/2 eg:country “England” eg:type “player” db:player/3 eg:country ”Brazil” eg:birthday”1992/2/5” db:band/1 eg:manager:”bob” eg:create “Far East Tour” eg:create “Voodoo Lounge” db:record/1 rdf type “Far East Tour” Db:record/2 rdf:type “Voodoo Lounge” Figure 3: RDF statements Figure 2: A RDF graph

5 DFS frequent pattern extraction
DFS code: A 5-tuple i,j denote the discovery time of DFS search. The rest denote the mapping of elements to integer. Figure 4 original graph Figure 5: summary pattern properties country 1 birthday 2 friends 3 teammate 4 instrestedIn 5 manager 6 create 7 types of subjects or objects player 8 band 9 record 10 literal Table 1: mapping of classes and properties into integer

6 Clustering linked data sources
db:player/1 eg:country “Argentina”; eg:birthday “1987/6/24”; rdf:type “player” eg:friends “Beckham” eg:teammate “neymar” eg:instrestedIn “the roling stones” db:player/2 eg:country “England” eg:type “player” db:player/3 eg:country ”Brazil” eg:birthday”1992/2/5” db:band/1 eg:manager:”bob” eg:create “Far East Tour” eg:create “Voodoo Lounge” db:record/1 rdf type “Far East Tour” Db:record/2 rdf:type “Voodoo Lounge” CD1{country,birthday,type,friends,teanmate,interestedIn} CD2{coutry,type} CD3{country,birthday,,type} CD4{manager,create} CD5{type} CD6{type} [Pool of individuals] CD1{country,birthday,type,friends,teanmate,interestedIn} CD2{coutry,type} CD3{country,birthday,,type} CD4{manager,create} CD5{type} CD6{type} C luster of label:player CD1{country,birthday,type,friends,teanmate,interestedIn} CD2{coutry,type} CD3{country,birthday,,type} Cluster of label;band CD4{manager,create} Cluster of label:record CD5{type} CD6{type} [Possible clusters] Figure 6: summary processing

7 Latent topic extraction
Conceptual patterns: Conceptual Motif Patterns(CM patterns): Generate random graphs that contain all nodes of the original graph and accept only those that have a similar node degree distribution as the original graph. Then, we use a t-Test to check the occurrence frequencies of patterns in the original against pattern frequencies in the accepted random graphs. Mutual Information Patterns(MI patterns): Count the strength of relationships between classes with an estimate of the mutual information Percolating Patterns: Combine the matches (conceptual patterns) Figure 7 graph summary processing

8 ExpLOD: a SPARQL assistance tool
db:player/1 eg:country “Argentina”; eg:birthday “1987/6/24”; rdf:type “player” eg:friends “Beckham” eg:teammate “neymar” eg:instrestedIn “the roling stones” db:player/2 eg:country “England” eg:type “player” db:player/3 eg:country ”Brazil” eg:birthday”1992/2/5” db:band/1 eg:manager:”bob” eg:create “Far East Tour” eg:create “Voodoo Lounge” db:record/1 rdf type “Far East Tour” Db:record/2 rdf:type “Voodoo Lounge” The RDF usage prefix : ’P’ for predicates; ’C’ for classes; ’I’ for instances; ’L’ for literals. Figure 8: applying bisimulation labels to RDF Figure 9: class usage summary Figure 10:predicate usage summary

9 Add user-selected abilities
SNAP: grouping nodes based on user-selected attributes and relationships. K-SNAP: on the basis of SNAP, user may control the size of clusters. Figure 11: SNAP summary user defined:{rdf:type}{interestedIn,create} Figure 12: k-SNAP different resolution(k)

10 A scalable approach Metadata extraction Resource sampling
Entity/topic extraction Profile graphs Profiles representation [picture from Fetahu B, Dietze S, Nunes B P, et al. A scalable approach for efficiently generating structured dataset topic profiles[M]]

11 Extracting core knowledge
Figure 14 processing pipeline Figure 15 corresponding RDF processing

12 Schema extraction

13 Schema construction What to extract? measures
The center that may cover or represent most of the information in the dataset Individuals Entities Properties …… measures Individuals ranking Tf-idf LDA ……

14 Web schema construction
Table 3: web schema content and statics [Tables from Ashraf J, Hadzic M. Web schema construction based on web ontology usage analysis[M]] Table 2:list of ontologies found in a e-Commerce dataset

15 Visual summary: LODex Figure 16: LODex architecture
Pictures from Benedetti F, Bergamaschi S, Po L. A Visual Summary for Linked Open Data sources[J]. Figure 17: a visual sample

16 summarization Approach Year User-customized Application input output
DFS-based 2010 No Represent dataset A RDF graph A graph Clustering 2013 Data integration/ query formulation RDF statements Latent topic 2012 Topics mining Multi-graph topics ExpLOD Query assistance A dataset Two kinds of graph User-control 2008 A little Multi-level inquiry Multi-resolution graphs scalable 2014 Topic extraction datasets Central types or properties Extracting core knowledge 2007 Path clusters Schema construction 2011 Ontologies recognition Ontologies usages Visual summary Exploring dataset The URL of a SPARQL endpoint Visual graph Table 4 summarization for all approaches


Download ppt "RDF graph summaries 金成 2014/11/3."

Similar presentations


Ads by Google