Download presentation
Presentation is loading. Please wait.
Published byHoratio McDaniel Modified over 9 years ago
1
An event-based denotational semantics for natural language queries of data represented in triple stores Richard Frost, Randy Fortier and Bryan St. Amour School of Computer Science University of Windsor ICSC 2013
2
Objectives of our research To create an efficient, modular Natural Language (NL) speech interface to graphical data which enables answers to questions to be computed directly from the data “xxx xx xxxxx xxx xxxxxx x xx x xx?” ⇧⇧ ⇧ ⇧⇧⇧ ⇧ ⇧⇧⇧ ⇧ ⇧ ⇧ ⇧⇧ Data = {(a,r1,c), (a,r2,f), (c,r3,g),……….} Efficient: polynomial time and space complexity. Modular: new language constructs can be added without affecting any existing code. Graphical data: binary-relational triple stores, converted relational data, semantic web RDF data. ICSC 2013
3
Why do we need a compositional semantics ? “How many states which are members of the United Nations have capitals in the southern hemisphere?” Information retrieval systems can only answer if a similar statement, with the answer, is in the data store. Even so, the statement would need to be updated whenever a new member is added to the U.N. or a change in capital is declared which affects the result. ICSC 2013
4
Progress so far X-SAIGA – an environment for constructing language processors as modular executable specifications of attribute grammars. Based on a top-down polynomial space/time complexity parser for arbitrary (ambiguous/left-recursive) CFGs. SpeechWeb – an architecture for creating speech interfaces to hyperlinked applications on the Web. NL semantics for conventional relational databases Youtube, enter: SpeechWeb NEXT STEP – SEMANTICS FOR GRAPHICAL DATA ICSC 2013
5
A Breakthrough - Montague’s (1970’s) approach to natural-language semantics (simplified) “Mars spins” English [[ Mars ]] [[ spins ]] = λp(p e_mars) spins_pred Higher-order Intensional Logic (IL) => spins_pred e_mars Data Model => True ICSC 2013
6
Montague Semantics (MS) “every moon spins” ( [[ every ]] [[ moon ]] ) [[ spins ]] = (λpλq ∀ x(p x → q x) moon_pred) spins_pred => λq ∀ x(moon_pred x → q x) spins_pred => ∀ x(moon_pred x → spins_pred x) => True (if all things that are moons spin) ICSC 2013
7
MS is polymorphic “ Mars and Venus spin ” => ( [[ and ]] [[ Mars ]] [[ Venus ]] ) [[ spin ]] => (λsλt (λr(s r & t r)) λp(p e_mars) λp(p e_venus)) spins_pred =>> λr(λp(p e_mars) r & λp(p e_venus) r) spins_pred => λp(p e_mars) spins_pred & λp(p e_venus) spins_pred => spins_pred e_mars & spins_pred e_venus => True & True => True ICSC 2013
8
MS is very powerful The semantics covers a large sub-set of classical first-order English. - does (((every moon) $and (every planet)) spin) - how_many (moons $that (orbit (a (red planet))) (were (discovered_by (the (person $who (discovered Nereid))))) - which planet (is (orbited_by (no moon))) It covers intensions, modal expressions (although we do not) The meaning of words can be defined in terms of other words. [[ discoverer ]] = [[ person $who (discovered (a thing)) ]] ICSC 2013
9
Montague Semantics is ideally suited as a basis for computerized query processors Denotational: every word and phrase has a well-defined mathematical meaning (denotation). Compositional: The meaning of a phrase is obtained from the meanings of its parts through simple (function application). Referentially transparent: the meaning of a phrase, after syntactic disambiguation, is always the same. There is a one-to-one correspondence between syntactic and semantic rules BUT ICSC 2013
10
Shortcomings of MS for query processing Computationally intractable: ∀ x(moon_pred x → spins_pred x) No explicit denotation for transitive verbs: left uninterpreted until the end and then a syntactic re-write is used to give IL expression Prepositional phrases not easy to accommodate in MS entity-based semantics Needs intermediate language: IL needs to be mapped to the triple store/binary-rel/RDF data model OR to another intermediate language (although Montague said that IL was dispensable). ICSC 2013
11
Our semantics Has the 4 Montagovian properties: denotational/modular/ etc. Computationally tractable: set based rather than predicates. Event based: Able to easily accommodate prepositional phrases. Has an explicit denotation for transitive verbs: enabling accommodation of phrases such as “wrote or interpreted”. No intermediate language: NL denotations are defined directly in terms of basic triple store operations. This approach differs from many other NL query approaches which map NL to SQL or SPARQL. ICSC 2013
12
An example datastore – 5 events {(EV 1000, REL "type", TYPE "born_ev"), (EV 1000, REL "subject", ENT "capone"), (EV 1000, REL "date", ENTNUM 1899), (EV 1001, REL "type", TYPE "join_ev"), (EV 1001, REL "subject", ENT "capone") (EV 1001, REL "object", ENT "fpg"), (EV 1002, REL "type", TYPE "membership"), (EV 1002, REL "subject", ENT "capone"), (EV 1002, REL "object", ENT "thief_set"), (EV 1002, REL "date", ENTNUM 1918 ), (EV 1004, REL "type", TYPE "steal_ev"), (EV 1004, REL "subject", ENT "capone"), (EV 1004, REL "object", ENT "car_1"), (EV 1005, REL "type", TYPE "smoke_ev"), (EV 1005, REL "subject", ENT "capone"), easily add (EV 1000, REL "location", ENT "brooklyn"), ICSC 2013
13
Basic retrieval operators. getts (ANY, REL “subject”, ENT “capone”) => {(1000, REL “subject”, ENT “capone”), (1001, REL “subject”, ENT “capone”), etc. getts can be used to define other basic operators. Definitions in the paper.. Example uses: get_subjs_for_events {EV 1000, EV 1009} => {ENT "capone", ENT "torrio"} get_members “thief_set” => {ENT “capone"} get_subjs_of_event_type “born_ev” => {ENT “capone”} We can now define semantics using these basic operators ICSC 2013
14
Our new semantics Note in paper and from now on: bold italic thief = [[ thief ]] thief = get_members “thief_set" e.g. thief => {ENT “capone”} smokes = get_subjs_of_event_type “smoke_ev” e.g. smokes => {ENT “capone”} capone setofents = (ENT "capone") ∈ setofents e.g. capone smokes => True a nph vbph = #( nph ⋂ vbph) ~= 0 term_and tmph1 tmph2 = f where f setofevs = (tmph1 setofevs) & (tmph2 setofevs) e.g. ((a thief ) $term_and capone) smokes => True ICSC 2013
15
Our new semantics – major contribution 1 join = make_trans “join_ev” e.g. join (a gang) => {ENT “capone”, ENT “torrio”} Definition: make_trans event_type = f where f tmph = { subj | (subj, evs) ∈ (make_image event_type) & tmph ( ⋃ {map thirds (getts (ev, REL "object", ANY)) | ev ∈ evs})} where, for example: make_image “join_ev” => {(ENT “capone”, {EV 1001, EV 1003}), (ENT “torrio”, {EV 1009})} ICSC 2013
16
Prepositional phrases – major contribution 2 steal_with_time tmph date = {subj | (subj, evs) ∈ image_steal & tmph ( ⋃ {thirds (getts (ev,REL"object",ANY)) | ev ∈ evs & date(thirds ( getts (ev,REL "date", ANY)))})} The date argument is used to “filter” the events. e.g. steal_with_time (a car) (date_1918) => {ENT "capone"} Note : we need to generalize and create a more powerful version of the make_trans function (this should not be too difficult) ICSC 2013
17
The result: A wide range of English NL queries e.g. “Which gangster who stole a car in 1915 or 1918 joined a gang that was joined by Torrio?” ⇩ which (gangster $that (steal_with_time (a car) (date_1915 $term_or date_1908)) (join (a (gang $that (joined_by torrio)))) ⇩ {ENT “capone”} The brackets are introduced by the parser, which will produce more than one bracketed expressions for ambiguous input. ICSC 2013
18
Next steps 1. Generalize the method for accommodating prepositional phrases and create a more powerful version of the make_trans function to cover queries such as : “who stole a car in Brooklyn in 1915” (our solution is briefly described in the paper). 2.Extend the parser of the existing NL speech query processor to accommodate prepositional phrases. 3.Replace the entity-based NL semantics of the existing query processor with the new event-based semantics. 4. Interface the new query processor with an RDF semantic web data source (will require converting RDF triples to event-based triples). 5. Develop methods for optimising queries to semantic web data. ⇩ An NL speech query interface to semantic web data ICSC 2013
19
References for previous work PARSING: Frost, R., Hafiz, R., Callaghan, P., (2007) Modular and efficient top-down parsing for ambiguous left-recursive grammars. In: 10th ACL, IWPT, 109–120. Hafiz, R. and Frost, R, (2010) Lazy combinators for executable specifications of general attribute grammars, Proceedings of the 12th International Symposium on Practical aspects of declarative languages (PADL), LNCS 5937, 167-182. SPEECH RECOGNITION: Frost, R. A. (2005). A call for a public-domain SpeechWeb. CACM 48 (11) 45-49. Frost, R. A., Ma, X. and Shi, Y. (2007) A browser for a public-domain SpeechWeb. WWW 2007, 1307-1308. SEMANTICS: Frost, R. A. (2006) Realization of natural language interfaces using lazy functional programming. ACM Comp. Surv. 38 (4) Article 11. Frost, R. A. and Fortier, R. (2007) An efficient denotational semantics for natural language database queries, NLDB 07, LNCS 4592, 12-24. YouTube: SpeechWeb => http://www.youtube.com/watch?v=Axa-n4etdZEhttp://www.youtube.com/watch?v=Axa-n4etdZE ICSC 2013
20
Acknowledgements Rahmatullah Hafiz Paul Callaghan Nabil Abdullah Ali Karaki Paul Meyer Jon Donais Matthew Clifford Shane Peelar Stephen Karamatos Walid Mnaymneh Rob Mavrinac Cai Filiault NSERC – Natural Science and Engineering Council of Canada Research Services - University of Windsor ICSC 2013
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.