Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney

May 13th, 20062 Overview Background SILT CL ANG and G EOQUERY Semantic Parsing using Transformation rules String-based learning Tree-based learning Experiments Future work Conclusion

May 13th, 20063 Natural Language Processing (NLP) Natural Language—human language. English The reason to process NL: To provide a much user-friendly interface Problems: NL is too complex. NL has many ambiguities. Until now, NL cannot be used to program a computer.

May 13th, 20064 Classification of Language Traditionally classification (Chomsky Hierarchy) Regular grammar Context-free grammar—Formal Language Context-sensitive grammar Unrestricted grammar—Natural Language All programming languages are less flexible than context-sensitive languages currently. For example, C++ is a restricted context-sensitive language.

May 13th, 20065 An Approach to process NL Map a natural language to a formal query or command language. Therefore, NL interfaces to complex computing and AI systems can be more easily developed. English Formal Language Map Compiler Interpreter

May 13th, 20066 Grammar Terms Grammar: G = (N, T, S, P) N: finite set of Non-terminal symbols T: finite set of Terminal symbols S: Starting non-terminal symbol, S ∈ N P: finite set of productions Production: x->y For example, Noun -> “computer” AssignmentStatement -> i := 10; Statements -> Statement; Statements

May 13th, 20067 SILT SILT—Semantic Interpretation by Learning Transformations Transformation rules Map substrings in NL sentences or subtrees in their corresponding syntactic parse trees to subtrees of the formal-language parse tree. SILT learns transformation rules from training data—pairs of NL sentences and manual translated formal language statements. Two target formal languages: CL ANG G EOQUERY

May 13th, 20068 CL ANG A formal language used in coaching robotic soccer in the RoboCup Coach Competition. C LANG grammar consists of 37 non-terminals and 133 productions. All tactics and behaviors are expressed in terms of if-then rules An example: ( (bpos (penalty-area our) ) (do (player-except our {4} ) (pos (half our) ) ) ) “If the ball is in our penalty area, all our players except player 4 should stay in our half.”

May 13th, 20069 G EOQUERY A database query language for a small database of U.S. geography. The database contains about 800 facts. Based on Prolog with meta-predicates augmentations. An example: answer(A, count(B, (city(B), loc(B, C), const(C, countryid(usa) ) ),A) ) “How many cities are there in the US?”

May 13th, 200610 Two methods String-based transformation learning Directly maps strings of the NL sentences to the parse tree of formal languages Tree-based transformation learning Maps subtrees to subtrees between two languages. Assumes the syntactic parse tree and parser of the NL sentences are provided

May 13th, 200611 Semantic Parsing Pattern matching Patterns found in NL Templates based on productions NL phrases Formal expression Rule representation for two methods “TEAM UNUM has the ball” CONDITION →(bowner TEAM {UNUM}) S NP TEAM UNUM VP VBZ has NP DT the NN ball

May 13th, 200612 Examples of Parsing 1. “If our player 4 has the ball, our player 4 should shoot.” 2. “If TEAM UNUM has the ball, TEAM UNUM should ACTION.” our 4 our 4 (shoot) 3. “If CONDITION, TEAM UNUM should ACTION.” (bowner our {4}) our 4 (shoot) 4. “If CONDITION,DIRECTIVE.” (bowner our {4}) (do our {4} (shoot) ) 5. RULE ( (bowner our {4}) (do our {4} (shoot) ))

May 13th, 200613 Variations of Rule Representation SILT allows patterns to skip some words or nodes “if CONDITION, DIRECTIVE.” -> ”then” To deal with non-compositionality SILT allows to apply constrains “in REGION” matches “CONDITION -> (bpos REGION)” if “in REGION” follows “the ball ”. SILT allows to use templates with multi productions “TEAM player UNUM has the ball in REGION” CONDITION → (and (bowner TEAM UNUM) (bpos REGION))

May 13th, 200614 Learning Transformation Rules Input: A training set T of NL sentences paired with formal representations; a set of productions in the formal grammar Output: A learned rule base L Algorithm: Parse all formal representations in T using. Collect positive P and negative examples N for all ∈. L = ∅ Until all positive examples are covered, or no more good rules can be found for any ∈, do: R’ = FindeBestRules(,P,N) L = L ∪ R’ Apply rules in L to sentences in T. Given a NL sentence S: P: if is used in the formal expression of S, then S is positive to N: if is not used in the formal expression of S, then S is negative to

May 13th, 200615 Issues of SILT Learning Non-compositionality Rule cooperation Rules are learn in order. Therefore an over-general ancestor will lead to a group of over-general child rules. Further, no rule can cooperate with that kind of rules. Two approaches can solve: 1. Find the single best rule for all competing productions in each iteration. 2. Over generate rules; then find a subset which can cooperate

May 13th, 200616 FindBestRule() For String-based Learning Input: A set of productions in the formal grammar; sets of positive P and negative examples N for each in Output: The best rule BR Algorithm: R = ∅ For each production π ∈ Π : Let R π be the maximally-specific rules derived from P. Repeat for k = 1000 times: Choose r1, r2 ∈ R π at random. g = GENERALIZE(r1, r2, π) Add g to R. R = R ∪ R BR = argmax r ∈ R goodness(r) Remove positive examples covered by BR from P.

May 13th, 200617 FindBestRule() Cont. Goodness (r) GENERALIZE r1, r2 : two transformation rules based on the same production For example: π : Region -> (penalty-area TEAM) pattern 1: TEAM ‘s penalty box pattern 2: TEAM penalty area Generalization: TEAM penalty

May 13th, 200618 Tree-based Learning Similar FindBestRules() algorithm GENERALIZE Find the largest common subgraphs of two rules. For example: π : Region -> (penalty-area TEAM) Pattern 1 Pattern 2 Generalization NN penalty NP TEAM POS ‘s NN box PRP$ TEAM NN area NP NN penalty NP, TEAM TEAM NN penalty NN

May 13th, 200619 Experiment As for CL ANG 300 pieces selected randomly from log files of 2003 RoboCup Coach Competition. Each formal instruction was translated into English by human. Average length of a NL sentence is 22.52 words. As for G EOQUERY 250 questions were collected from undergraduate students. All English queries were translated manually. Average length of a NL sentence is 6.87 words.

May 13th, 200620 Result for CL ANG

May 13th, 200621 Result for CL ANG (Cont.)

May 13th, 200622 Result for G EOQUERY

May 13th, 200623 Result for G EOQUERY (Cont.)

May 13th, 200624 Time Consuming Time consuming in minutes.

May 13th, 200625 Future Work Though improved, SILT still lacks robustness of statistical parsing. The hard-matching symbolic rules of SILT are sometimes too brittle. A more unified implementation of tree-based SILT which allows to directly compare and evaluate the benefit of using initial syntactic parsers.

May 13th, 200626 Conclusion A novel approach, SILT, can learn transformation rules that maps NL sentences into a formal language. It shows better overall performance than previous approaches. NLP, still a long way to go.

May 13th, 200627 Thank you! Questions or comments?

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

Similar presentations

Presentation on theme: "Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

Similar presentations

Presentation on theme: "Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney."— Presentation transcript:

Similar presentations

About project

Feedback