OntoSoar: Feeding a Growing Ontology CS 652 Information Extraction and Integration Fall 2012 Peter Lindes pl 12/4/2012OntoSoar1
Project Goals Use linguistic technologies to: – Find more facts – Learn new categories and relations Technologies to be used: – OntoES – Link Grammar Parser – LG-Soar – Soar – Discourse Representation Theory pl 12/4/2012OntoSoar2
What is OntoES? A system for building OSM models Capable of representing extraction ontologies Processes text to extract facts pl 12/4/2012OntoSoar3
pl 12/4/2012OntoSoar4
pl 12/4/2012OntoSoar5
What is Soar? pl 12/4/2012OntoSoar6 A cognitive architecture A system that implements that architecture Major elements: – Short- and long-term memories – Decision procedure – Perception and action modules – Various kinds of learning Example applications: – TacAirSoar – BOLT Project
pl 12/4/2012OntoSoar7
pl 12/4/2012OntoSoar8
pl 12/4/2012OntoSoar9
pl 12/4/2012OntoSoar10
OntoSoar pl 12/4/2012OntoSoar11
Raw OCR’d Text Example pl 12/4/2012OntoSoar12
Segmented Text pl 12/4/2012OntoSoar Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ; m. 1856, Mary Augusta Andruss, 992 Broad St., Newark, N. J., who was b. 1825, dau. of Judge Caleb Halstead Andruss and Emma Sutherland Goble. Mrs. Lathrop died at her home, 992 Broad St., Newark, N. J., Friday morning, Nov. 4, The funeral services were held at her residence on Monday, Nov. 7, 1898, at half- past two o'clock P. M. Their children: 1. Charles Halstead, b. 1857, d William Gerard, b. 1858, d Theodore Andruss, b. i Emma Goble, b
Parsing and Semantics pl 12/4/2012OntoSoar Xp | Ss | Wd VJlsi MVp | | +----G Pa--+-MVp-+-IN-+ +--VJrsi-+ +-IN-+ | | | | | | | | | | | | | LEFT-WALL Charles.b Halstead was.v-d born.a in.r 1857 and.j-v died.v-d in.r Charles Halstead, b. 1857, d Charles Halstead was born in 1857 and died in person(P1) named(P1, "Charles Halstead") born(P1, "1857") died(P1, "1861") Person(P1) Person_Name(P1, "Charles Halstead") Person_BirthDate(P1, "1857") Person_DeathDate(P1, "1861") Predicates: Extracted facts:
More Complex Parsing pl 12/4/2012OntoSoar15 Charles Christopher Lathrop, N. Y. City, was born in 1817 and died in 1865 and was the son of Mary Ely and Gerard Lathrop ; Ss MXs | +----Xd VJlsi G G----+ | +-G+-G-+Xc+ +---Pv--+-MVp-+-IN-+ | | | | | | | | | | | | Charles.b Christopher.b Lathrop, N. Y. City, was.v-d born.v in.r VJrsi VJlsi Ost Ju | +--MVp-+-IN-+ +-VJrsi-+ +-Ds-+-Mp-+ +--G--+-SJls-+ | | | | | | | | | | | | and.j-v died.v-d in.r 1865 and.j-v was.v-d the son.n of Mary.b Ely.m and.j-n --SJrs G---+ | | Gerard.m Lathrop [;]
More Complex Semantics pl 12/4/2012OntoSoar16 person(P2) named(P2, "Charles Christopher Lathrop") place(GE1) named(GE1, "N. Y. City") livedIn(P2, GE1) born(P2, "1817") died(P2, "1865") person(P3) named(P3, "Mary Ely") son(P2, P3) person(P4) named(P4, "Gerald Lathrop") son(P2, P4) couple(P3, P4) Person(P2) Person(P3) Person(P4) Person_Name(P2, "Charles Christopher Lathrop") Person_Name(P3, "Mary Ely") Person_Name(P4, "Gerald Lathrop") Person_BirthDate(P2, "1817") Person_DeathDate(P2, "1865") Parent_has_Child(P3, P2) Parent_has_Child(P4, P2) Male(P2) Parent_with_Parent(P3, P4) GeoEntity(GE1) GeoEntity_Name(GE1, "N. Y. City") Person_livedIn_GeoEntity(P2, GE1)
More Learning pl 12/4/2012OntoSoar17 He graduated B. A. from Rensselaer Polytechnic College, Troy, N. Y MXs Os Js MXs Xd Ss--+ +-G+-Mp G G Xd-+Xca+ +-G+ | | | | | | | | | | | | | he graduated.v-d B. A. from Rensselaer Polytechnic College, Troy.b, N. Y. pro3SingMasc(X1) institution(I1) named(I1, "Rensselaer Polytechnic College") graduatedFrom(X1, I1) person(X1) place(GE2) named(GE2, "Troy, N. Y.") locatedIn(I1, GE2) person(P5) named(P5, "Gardner Bullard") sameAs(X1, P5) Person(P5) GeoEntity(GE2) Person_Name(P5, "Gardner Bullard") GeoEntityName(GE2, "Troy, N. Y.") Male(P5) Institution(I1) Institution_Name(I1, "Rensselaer Polytechnic College") Person_graduatedFrom_Institution(P5, I1) Institution_locatedIn_GeoEntity(I1, GE2)
Processing Steps Gather section of text Segment into sentence fragments Parse with the LG-Parser Build predicates with LG-Soar Resolve pronouns using DRT Convert predicates to facts Match extracted facts against conceptual model Record facts that match Learn from partial matches pl 12/4/2012OntoSoar18