Entity Description Pattern Extraction and Their Usage in Entity Query Websoft 蒋继东 2014/4/21
Content Goal Interface design Workflow Measure Experiment
Our goal Extract EDPs from a given RDF datasets Automatically generate a form-based query using EDPs
What is an EDP Entity Description Pattern (EDP) A dataset is composed of many entities. For any one of them, we extract its classes and properties to make a collection, which is called an Entity Description Pattern (EDP).
Entity query interface Input a class Person actor, actor_name, actor_actorid, performance, page (50603) Select ?x where { ?x a Person. } Query form EDP hierarchy director, director_directorid, director_name, label, page (…) editor, editor_name, editor_editorid, label, page (…)
Entity query interface Person EDP1 actor, actor_name, actor_actorid, performance, page (50603) Select ?x where { ?x a Person. ?x a actor. ?x actor_name ___. ?x actor_actorid ___. ?x performance ___. ?x page ___. } director, director_directorid, director_name, label, page (…) editor, editor_name, editor_editorid, label, page (…)
Entity query interface EDP path Person EDP1 →EDP1.1 actor, actor_name, actor_actorid, label, performance, page (32471) Select ?x where { ?x a Person. ?x a actor. ?x actor_name ___. ?x actor_actorid ___. ?x performance ___. ?x page ___. ?x label ___. } actor, actor_name, actor_actorid, performance, page, sameAs (…) actor, actor_name, actor_actorid, performance, page, hasPhotoCollection (…)
Workflow EDP extraction Divide a given dataset to entities and extract EDPs with frequency from it. {[Person, hasParent, name, shoesize], [Person, hasSister, name], [Woman, name, hasSister]}
Person, name, hasParent, shoesize Workflow EDP selection Top k or coverage ratio? EDP hierarchy EDPs Person, name Person, name, hasSister Women, name, hasSister Person, name, hasParent, shoesize
Measure Sub-EDP Maximum frequent itemsets mining Maximum coverage Included, subclass, sub-property… Maximum frequent itemsets mining Threshold? Maximum coverage EDP rank
Experiment Jamendo 1047950 triples. 335925 entities. 34 EDPs. The top 8 EDPs covered 90 percent of the entities and top 12 EDPs covered more than 99 percent of the entities. Few hierarchical relationships.
Experiment LinkedMDB 6148121 triples. 694399 entities. 8460 EDPs. The top 40 EDPs covered 83.3 percent of the entities. Abundant hierarchical relationships.
Thanks