Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph Data Management Lab, School of Computer Science Put conference information here.

Similar presentations


Presentation on theme: "Graph Data Management Lab, School of Computer Science Put conference information here."— Presentation transcript:

1 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com Put conference information here Reporter: Qi Liu YAGO

2 2 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN What is YAGO?  A semantic web  A knowledge base  A combination of WordNet and wikipedia

3 3 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Semantic web  Advocated by W3C(World Wide Web Consortium)  Aimed at reconstructing the WWW  A standard framework: RDF(Resource Description Framework)

4 4 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN What is YAGO?  A semantic web  A knowledge base  A combination of WordNet and wikipedia

5 5 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Knowledge base  To be: A special database for knowledge management  To do: Provides a means for collecting, organising, searching and utilising information  Three types: Machine-readable knowledge bases(DBpedia) Human-readable konwledge bases(Wikipedia) Knowledge base analysis and design

6 6 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN What is YAGO?  A semantic web  A knowledge base  A combination of WordNet and wikipedia

7 7 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN WordNet  To be: A lexical database for English since 1985  To do: Groups words into synsets Provides short, general definitions Records the semantic relations between these synsets  25 basic noun groups & 15 verb groups

8 8 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Key Concepts  Ontology vs Taxonomy  Lexicon:the bridge between a language and the knowledge expressed in that language  Syntactic (there vs their)  Semantic (sight vs site)  Pragmatic (infer vs imply)

9 9 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Figure 1: Hierarchy of top-level categories in KR ontology  See also http://www.jfsowa.com/ontology/toplevel.htm

10 10 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Semantics of YAGO  Five relations: Domain Range subRelationof Type subClassOf  Entities: Domain Relation Range Literal......

11 11 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Axiomatic rules

12 12 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Reasoning rules  correctness and completeness

13 13 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN The YAGO system  Knowledge extraction  YAGO storage  Enriching YAGO

14 14 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Knowledge extraction  TYPE relation  SUBCLASSOF relation  MEANS relation  Other relations  Meta-relations

15 15 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN TYPE relation extraction  The Wikipedia Category System Types: conceptual, administrative, relational, thematic  Identifying Conceptual Categories Conceptual  TYPE Adm and relational ones: excluded by hand Employ a shallow linguistic parsing(Noun Group Parser) of the left two categories E.g. Naturalized citizens of United States domain and range extracted at the same time

16 16 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN SUBCLASSOF relation extraction  Wikipedia categories DAG(directed acyclic graph) Reflect merely the thematic structure Use only the leaf categories of Wikipedia  Integrating WordNet Synsets Match or prefer WordNet  Establishing subClassOf American people in Japan  Exceptions Correct manually

17 17 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Means relation extraction  Exploiting WordNet Synsets A synset{urban center,metropolis, city} Attach a class for the synset ‘city’  Exploiting Wikipedia Redirects Search “Einstein, Albert”, redirected to “Albert, Einstein”  Parsing Person Names givenNameOf subRelationOf means familyNameOf subRelationOf means

18 18 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Other relations extraction  BornInYear & DiedInYear  EstablisedIn & LocatedIn  WrittenInYear  PolitionOf  HasWonPrize  Filtering the Results

19 19 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Meta-relations extraction  Descriptions Individual DESCRIBES URL  Witness Fact FoundIn URL(of its witness page) ExtractedBy  Context Linkages btw A&B: A Context B

20 20 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Knowledge extraction  TYPE relation  SUBCLASSOF relation  MEANS relation  Other relations  Meta-relations

21 21 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN The YAGO system  Knowledge extraction  YAGO storage  Enriching YAGO

22 22 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN YAGO storage  Model independent of storage  Storage: Text files, XML, database tables, RDF

23 23 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Enriching YAGO  Add the fact(x,r,y) Map x,y to existing entities(word sense disambiguation) If mapping failed, add new entity. Map r to YAGO ontology If mapping successed, add a FoundIn relation If mapping failed, add a new fact!

24 24 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Summary on YAGO1  1M entities & 5M facts  Accuracy around 95%

25 25 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN

26 26 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN YAGO2: In Time, Space and Many Languages  YAGO: about 100 manually defined relations  Build YAGO2 architecture based on such rules: Factual rules E.g. Exceptions,definition of all relations, domains, ranges and classes Implication rules Inferring rules from the facts in the database Replacement rules Normalize numbers, tags and other formats Extraction rules Extracting facts from a given source text

27 27 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Temporal Dimension  People wasBornOnDate & diedOnDate  Groups wasCreatedOnDate&wasDestroyedOnDate  Artifacts(buildings, songs,cities) [same as above]  Events startedOnDate & endedOnDate =>startExistingOnDate&endExistingOnDate  Facts  Entities in a fact =>subjectStartRelation&objectStartRelation

28 28 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN GEO-SPATIAL Dimension  All physical objects have a location in space!  Define it with geographical coordinates, i.e. Latitude and longtitude =>yagoGeoCoordinates, =>hasGeoCoordinates  Two sources: Wikipedia GeoNames locatedIn & hasGeoCoordinates &

29 29 Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDANhttp://gdm.fudan.edu.cn Email: zerup123@gmail.com GDM@FUDANGDM@FUDAN Textual Dimension  hasWikipediaAnchorText  hasWikipediaCategory  hasCitationTitle subClassOf hasContext Integrating UWN to including 200 languages


Download ppt "Graph Data Management Lab, School of Computer Science Put conference information here."

Similar presentations


Ads by Google