Gaby Nativ, SDBI 2007
Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion
Which NASA astronaut was born when Elvis was born?
Problem : Web pages are designed to be read by people, not machines Solution : Semantic-Web Meaning of information and Services is defined People and machines can use web content
Knowledge representation language Individuals - instances or objects Classes - concepts or types of objects Relations – ways that classes and objects can related to one another. Facts - instance of relation between individuals,classes or relations (Elvis Presley, Isa, Singer)
Directed Labeled Multi Graph G = ( V,E,L v,L e ) V is a set of vertices E V × V is a multi-set of edges L v is a is a set of individual and class labels L e is a set of relation labels With each edge we associate a confidence value
born 1935 ? born type astronaut person entity subclass "Elvis Presley""The King" means Words type Individuals Classes Relations
Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion
Assemble the ontology manually: Wordnet SUMO GeneOntology Etc’.. Problems: Usually low coverage
Semantic lexicon for English language. Developed in Princeton University since 1985 Groups English words into synsets Providing short,general definition Records a various semantic relations. Contains about 150,000 words organized in over 115,000 synsets.
Concerned itself with meta-level concepts First released in December 2000 Maintained by Articulate Software
Part of large effort – Open Biomedical Ontologies. Constructed in 1998 – 3 models biological processes cellular components molecular function As of 2005 GO contained over 19,000 terms
Automated extraction of ontology KnowItAll University of Washington TextToOnto University of Karlsruhe Use pattern matching & machine learning techniques Problem: Usually low accuracy ( 50 %- 92 %)
Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion
Interface Web YAGO KB LEILA Knowledge Acquisition Tools NAGA Query Processing & Ranking Browser Query Input and Output Tunable Parameters User Backend
Based on decidable and simple model Extensible ontology High coverage YAGO knows over 1.7 M entities,14M facts High quality Empirical evaluation : 95% accuracy
Assemble the ontology from Wikipedia Good Coverage, 7.83 M entities in all languages
Good Accuracy
Uses a deep linguistic analysis Machine learning techniques (SVM) Input A binary target relation A set of Web Documents Extract All pairs of entities that are in the target relation
1935 born American_singer type People_by_occupation Business ? Social_group Classes
Each synset of Word-Net becomes a class of YAGO Extract only Wikipedia’s leaf categories Exclude Known Individuals in Wordnet e.g. Albert Einstein will be excluded 15,000 cases WordNet & Wikipedia Conflict in Meaning prefer Wordnet ”Time exposure” is a common noun for WordNet, but an album title for Wikipedia.
Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : 1935_births 1935 bornInYear Exploit relational categories bornInYear diedInYear, EstablishedIn
Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : American_singers 1935 born Exploit conceptual categories subClassOf type American_singer type
Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : Rock'n_Roll_Music 1935 born American_singer type Rock'n_Roll_Music Avoid thematic categories
Shallow linguistic noun phrase parsing: American singers of German origin Premodifier Head Postmodifier Heuristics: If the head is a plural word, the category is conceptual.
Pling stemmer
1935 born American_singer type Singer Person subclass "singer" means "Elvis Presley" means
Storing Witness Storing each individual the URL of the corresponding Wikipedia page Storing Confidence
YAGO - A Core of Semantic Knowledge born American_singer type Singer#1 Person#3 subclass "singer" means "Elvis Presley" means wiki/Elvis_Presly FoundIn LEILA ExtactedBy
singer type But only from 1953 to 1977 We know this from Wikipedia Fact (Elvis, is_a,singer)
#1 (Elvis, is_a, singer) #2 (#1, time, ) #3 (#1, source,Wikipedia) type Wikipedia time source singer LEILA 0.93
A YAGO ontology over a set of relations R ( type,subClassOf) a set of common entities C ( entity, class, relation) a set of fact identifiers I Y : I (R C I) R (R I C) We can talk about : facts (#1, source, Wikipedia) additional arguments (#1, time, ) relations (time, hasRange, time_interval)
= subclassOf type aCyclicTransitiveRelation Axioms & Rules: (x, is_a, y) (y, subclass, z) => (x, is_a, z)... singer person subClassOf type
Types Relations
{(r1, subRelationOf, r2), (x, r1, y)} -> (x, r2, y) {(r, type, acyclicTransitiveRelation), (x, r, y), (y, r, z)} -> (x, r, z)} {(r, domain, c), (x, r, c)} -> (x, type, c)} {(r, range, c), (x, r, y)} -> (y, type, c)} {(x, type, c1), (c1, subClassOf, c2)} -> (x, type, c2)}
Axioms: (x, is_a, y) (y, subclass,z) => (x, is_a, z)... f1, f2, f3, f4, f5 f1, f2, f3 f1, f2, f3, f4, f5, f6, f7, f8, f9, f10 derive facts Eliminate facts finite, unique
Consistency YAGO ontology is consistent iff x,r : (r,TYPE, acyclicTransitiveRelation) D(y) (x,r,x) D(y) Since D(y) is finite, the consistency of a YAGO ontology is decidable.
Is Lake Victoria “locatedIn” Tanzania? When entity should be an individual or a class? e.g. Physics is individual of science
KnowItAll SUMO WordNet OpenCyc Cyc 30,000 60, , ,000 2,000,000 Yago 14,000,000
inf.mpg.de/~suchanek/downloads/yago/ inf.mpg.de/~suchanek/downloads/yago/ Which astronaut was born in the same year as Elvis? "Elvis Presley" bornInYear $year $astro bornInYear $year $astro isa astronaut 20 Results
Roger Bruce Chaffee February 15, 1935 was a U.S. Navy pilot who became an American astronaut in the Apollo program. Died during training in the Apollo 1 fire
Motivation Other Ontologies System overview YAGO Dive IN LEILA overiew NAGA overview Conclusion
Interface Web YAGO KB LEILA Knowledge Acquisition Tools NAGA Query Processing & Ranking Browser Query Input and Output Tunable Parameters User Backend
EVIDENCE QUERY Search the evidence for certain hypothesis DISCOVERY QUERY KielMaxPlanckPhysicist IsA bornIn Physicist Max Planck IsA $X $Y IsA bornInYear Discover pieces of missing information
REGULAR EXPRESSION QUERY An expresion user might be interested in certain Path of relations between pieces of information scientist$XLiu GivenNameOf|familyNameO f IsA river$X Afric a locatedIn* IsA
RELATEDNESS QUERY Find a broad relation between pieces of information. Both are physicists and both are scientists There are Moon craters and asteroid belts named after them Tom Cruise connects them by being a vegetarian Bohr Einstein connect
The answer to a query Q is a subgraph A of the knowledge graph that matches Q. Q: A: Physicist Max Planck type $X $Y type bornInYear Physicist Max Planck type 1858 Mihajlo Puin type bornInYear
Combines three measures: Extraction Confident The informativeness of a fact (e.g. the fact Albert_Einstein isA physicist is more informative than Albert_Einstein isA person) Compactness of answer graph (e.g “How are Einstein and Bohr related? Both Win Nobel then connected by Tom Cruze )
55 queries from TREC 2005/2006 12 queries from the work on SphereSearch 18 regular expression queries The queries were posed to Google, Yahoo! Answers, and NAGA at the same time
Semantic Web Vision System Overview YAGO bases on logically clean model accuracy of around 95% YAGO is 7 times larger than the largest competitor. Investigate the relationship OWL1.1 and YAGO model.
“YAGO – A Core of Semantic Knowledge" “NAGA: Harvesting, Searching and Ranking Knowledge” “LEILA: Learning to Extract Information by Linguistic Analysis” (Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum …) Available at
Questions ?