Graph Data Management Lab, School of Computer Science Put conference information here Reporter: Qi Liu YAGO
2 Graph Data Management Lab, School of Computer Science What is YAGO? A semantic web A knowledge base A combination of WordNet and wikipedia
3 Graph Data Management Lab, School of Computer Science Semantic web Advocated by W3C(World Wide Web Consortium) Aimed at reconstructing the WWW A standard framework: RDF(Resource Description Framework)
4 Graph Data Management Lab, School of Computer Science What is YAGO? A semantic web A knowledge base A combination of WordNet and wikipedia
5 Graph Data Management Lab, School of Computer Science Knowledge base To be: A special database for knowledge management To do: Provides a means for collecting, organising, searching and utilising information Three types: Machine-readable knowledge bases(DBpedia) Human-readable konwledge bases(Wikipedia) Knowledge base analysis and design
6 Graph Data Management Lab, School of Computer Science What is YAGO? A semantic web A knowledge base A combination of WordNet and wikipedia
7 Graph Data Management Lab, School of Computer Science WordNet To be: A lexical database for English since 1985 To do: Groups words into synsets Provides short, general definitions Records the semantic relations between these synsets 25 basic noun groups & 15 verb groups
8 Graph Data Management Lab, School of Computer Science Key Concepts Ontology vs Taxonomy Lexicon:the bridge between a language and the knowledge expressed in that language Syntactic (there vs their) Semantic (sight vs site) Pragmatic (infer vs imply)
9 Graph Data Management Lab, School of Computer Science Figure 1: Hierarchy of top-level categories in KR ontology See also
10 Graph Data Management Lab, School of Computer Science Semantics of YAGO Five relations: Domain Range subRelationof Type subClassOf Entities: Domain Relation Range Literal......
11 Graph Data Management Lab, School of Computer Science Axiomatic rules
12 Graph Data Management Lab, School of Computer Science Reasoning rules correctness and completeness
13 Graph Data Management Lab, School of Computer Science The YAGO system Knowledge extraction YAGO storage Enriching YAGO
14 Graph Data Management Lab, School of Computer Science Knowledge extraction TYPE relation SUBCLASSOF relation MEANS relation Other relations Meta-relations
15 Graph Data Management Lab, School of Computer Science TYPE relation extraction The Wikipedia Category System Types: conceptual, administrative, relational, thematic Identifying Conceptual Categories Conceptual TYPE Adm and relational ones: excluded by hand Employ a shallow linguistic parsing(Noun Group Parser) of the left two categories E.g. Naturalized citizens of United States domain and range extracted at the same time
16 Graph Data Management Lab, School of Computer Science SUBCLASSOF relation extraction Wikipedia categories DAG(directed acyclic graph) Reflect merely the thematic structure Use only the leaf categories of Wikipedia Integrating WordNet Synsets Match or prefer WordNet Establishing subClassOf American people in Japan Exceptions Correct manually
17 Graph Data Management Lab, School of Computer Science Means relation extraction Exploiting WordNet Synsets A synset{urban center,metropolis, city} Attach a class for the synset ‘city’ Exploiting Wikipedia Redirects Search “Einstein, Albert”, redirected to “Albert, Einstein” Parsing Person Names givenNameOf subRelationOf means familyNameOf subRelationOf means
18 Graph Data Management Lab, School of Computer Science Other relations extraction BornInYear & DiedInYear EstablisedIn & LocatedIn WrittenInYear PolitionOf HasWonPrize Filtering the Results
19 Graph Data Management Lab, School of Computer Science Meta-relations extraction Descriptions Individual DESCRIBES URL Witness Fact FoundIn URL(of its witness page) ExtractedBy Context Linkages btw A&B: A Context B
20 Graph Data Management Lab, School of Computer Science Knowledge extraction TYPE relation SUBCLASSOF relation MEANS relation Other relations Meta-relations
21 Graph Data Management Lab, School of Computer Science The YAGO system Knowledge extraction YAGO storage Enriching YAGO
22 Graph Data Management Lab, School of Computer Science YAGO storage Model independent of storage Storage: Text files, XML, database tables, RDF
23 Graph Data Management Lab, School of Computer Science Enriching YAGO Add the fact(x,r,y) Map x,y to existing entities(word sense disambiguation) If mapping failed, add new entity. Map r to YAGO ontology If mapping successed, add a FoundIn relation If mapping failed, add a new fact!
24 Graph Data Management Lab, School of Computer Science Summary on YAGO1 1M entities & 5M facts Accuracy around 95%
25 Graph Data Management Lab, School of Computer Science
26 Graph Data Management Lab, School of Computer Science YAGO2: In Time, Space and Many Languages YAGO: about 100 manually defined relations Build YAGO2 architecture based on such rules: Factual rules E.g. Exceptions,definition of all relations, domains, ranges and classes Implication rules Inferring rules from the facts in the database Replacement rules Normalize numbers, tags and other formats Extraction rules Extracting facts from a given source text
27 Graph Data Management Lab, School of Computer Science Temporal Dimension People wasBornOnDate & diedOnDate Groups wasCreatedOnDate&wasDestroyedOnDate Artifacts(buildings, songs,cities) [same as above] Events startedOnDate & endedOnDate =>startExistingOnDate&endExistingOnDate Facts Entities in a fact =>subjectStartRelation&objectStartRelation
28 Graph Data Management Lab, School of Computer Science GEO-SPATIAL Dimension All physical objects have a location in space! Define it with geographical coordinates, i.e. Latitude and longtitude =>yagoGeoCoordinates, =>hasGeoCoordinates Two sources: Wikipedia GeoNames locatedIn & hasGeoCoordinates &
29 Graph Data Management Lab, School of Computer Science Textual Dimension hasWikipediaAnchorText hasWikipediaCategory hasCitationTitle subClassOf hasContext Integrating UWN to including 200 languages