Solving geometry problems: combining text and diagram interpretation Minjoon Seo 1, Hannaneh Hajishirzi 1, Ali Farhadi 1,2, Oren Etzioni 2, Clint Malcolm 1 Sep. 20 EMNLP
Geometry Word Problems In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD. What is the length of BD? a)2b) 4c) 6 d) 8e) 10 EBD A O 5 2 C 2
Why geometry problems? Solving geometry word problems is challenging in AI Part of broader scope of solving math word problems (Kushman et al., 2014; Hosseini et al., 2014; Roy et al., 2015; Dai et al., 2015; Shi et al., 2015) Interesting interplay between natural language and vision
Why geometry problems? Solving geometry word problems is challenging in AI Part of broader scope of solving math word problems (Kushman et al., 2014; Hosseini et al., 2014; Roy et al., 2015; Dai et al., 2015; Shi et al., 2015) Interesting interplay between natural language and vision In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD. What is the length of BD? a)2b) 4c) 6 d) 8e) 10
Why geometry problems? Solving geometry word problems is challenging in AI Part of broader scope of solving math word problems (Kushman et al., 2014; Hosseini et al., 2014; Roy et al., 2015; Dai et al., 2015; Shi et al., 2015) Interesting interplay between natural language and vision 5 EBD A O 5 2 C
Why geometry problems? Solving geometry word problems is challenging in AI Part of broader scope of solving math word problems (Kushman et al., 2014; Hosseini et al., 2014; Roy et al., 2015; Dai et al., 2015; Shi et al., 2015) Interesting interplay between natural language and vision Closely related to language & vision and grounded language acquisition Requires semantic understanding of each modality Has well-defined metric Interesting to NLP: unique characteristics of the geometry word problems.
Challenge #1 Interaction between Text and Diagram Previous work in semantic parsing and relation extraction does not consider another modality (Zettlemoyer and Collins, 2005; Kate and Mooney, 2007; Poon and Domingos, 2009; Kwiatkowski et al., 2013; Flanigan et al., 2014; Reddy et al., 2014; Berant et al., 2014; Cowie and Lehnert, 1996; Culotta and Sorensen, 2004) 7 In the diagram at the right, the line is tangent to the circle. A B O C
Challenge #2: Lexical Ambiguity 8 8 A B O C Line OC bisects line AB, and line OC bisects angle AOB.
Challenge #3: Implication 9 Circle O has a radius of 5. EBD A O 5 2 C Equals(RadiusOf(O), 5)
Challenge #4: Syntactic Complication 10 A B C D AB and CD are perpendicular. AB and CD are perpendicular to EF. A B C D E F
11 B DE A C In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17 (d) 15 GeoS: Overview
Diagram Text facts answer (1) (2) (3) (1) Diagram understanding (Seo et al., 2014) (2) Text parsing (3) Solving 12 Help
Diagram-aided text parsing In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17 B DE A C 13 Text Input Logical form Difficult to directly map text to a long logical form!
Diagram-aided text parsing In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17 B DE A C IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) … Over-generated literals … Text scores n/a … Diagram scores Selected subset 14 Text Input Logical form Our method
Step 1. Literal over-generation In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17 B DE A C IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) … Over-generated literals 15
Step 1. Generating literals “Lines AB and CD are perpendicular to EF” IsLine(AB) IsLine(CD) IsLine(EF) Perpendicular(AB, CD) Perpendicular(CD, EF) Perpendicular(AB, EF) 16
Step 1. Generating literals “Lines AB and CD are perpendicular to EF” IsLine(AB) IsLine(CD) IsLine(EF) Perpendicular(AB, CD) Perpendicular(CD, EF) Perpendicular(AB, EF) Red literals are false. 17
Concepts Lines AB and CD are perpendicular to EF AB Perpendicular CD EF IsLine 18
Lexicon We built lexicon from training data and textbooks Lexicon maps geometry-related words (or phrases) to concepts Some concepts are obtained via simple regular expressions Single word can map to two or more concepts Word or phraseConcept “Perpendicular” Perpendicular “Lies on” PointLiesOnLine, PointLiesOnCircle “CD” line, arc “ABC” triangle, angle 19
Relations Lines AB and CD are perpendicular to EF AB Perpendicular CD EF IsLine 20
Relations Lines AB and CD are perpendicular to EF AB Perpendicular CD EF IsLine Unary relation IsLine(EF) 21
Relations Lines AB and CD are perpendicular to EF AB Perpendicular CD EF IsLine Binary relation Perpendicular(AB, CD) 22
Step 2. Text scores of literals In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17 B DE A C IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) … Over-generated literals … Text scores 23
Relation score Lines AB and CD are perpendicular to EF AB Perpendicular CD EF IsLine
Relation classification Supervision: annotated logical forms Training data: all possible relations from training questions Relations found in annotations: positive All others: negative Logistic regression with L2 regularization Features: Stanford dependency parse Part of speech tags Type of concept (line, circle, triangle, predicate, etc.) 25 IsLine->AB IsLine->CD IsLine->EF Perpendicular->AB, CD Perpendicular->CD, EF Perpendicular->AB, EF
Text scores of literals Equals RadiusOf O Literal Label for edge Edge (relation) Question text Logistic regression parameters to be learned “Circle O has radius of 5” 26
Step 3. Diagram scores of literals In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17 B DE A C IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) … Over-generated literals … Text scores n/a … Diagram scores 27
Step 3. Diagram scores of literals B DE AC TEXTDIAGRAM Parallel(AC, DB) “AC and DB are parallel with DE and AD, respectively.”
Step 3. Diagram scores of literals B DE AC 29 TEXTDIAGRAM Parallel(AC, DB) Parallel(AC, DE) “AC and DB are parallel with DE and AD, respectively.”
Step 3. Diagram scores of literals B DE AC Diagram understanding in geometry questions (Seo et al., 2014) 30 TEXTDIAGRAM Parallel(AC, DB) Parallel(AC, DE) “AC and DB are parallel with DE and AD, respectively.”
Step 4. Subset selection In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17 B DE A C IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) … Over-generated literals … Text scores n/a … Diagram scores Selected subset 31
Step 4. Subset selection IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) Equals(LengthOf(AC), 4) Parallel(DE, DB) Equals(LengthOf(DB, 8) Find(LengthOf(AC)) Find(LengthOf(DE)) Find(LengthOf(DB)) … IsTriangle(ABC) Parallel(AC, DE) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Find(LengthOf(AC)) Affinity (diagram+text) Coherence (covers all facts, no conflict) 32 High text and diagram scores Cover all facts in text Literals don’t conflict
0ptimization algorithm affinity coherence Bad news: combinatorial optimization is NP-hard Good news: objective function is submodular Greedy algorithm efficiently finds a solution with bounded distance to the optimum. Starting from empty set, greedily add the next best literal to the set. 33
Solving Diagram Text facts answer (1) (2) (3) Help 34
Numerical solver LiteralEquation Equals(LengthOf(AB),d)(A x -B x ) 2 +(A y -B y ) 2 -d 2 = 0 Parallel(AB, CD)(A x -B x )(C y -D y )-(A y -B y )(C x -D x ) = 0 PointLiesOnLine(B, AC)(A x -B x )(B y -C y )-(A y -B y )(B x -C x ) = 0 Perpendicular(AB,CD)(A x -B x )(C x -D x )+(A y -B y )(C y -D y ) = 0 Find the solution to the equation system Use off-the-shelf numerical minimizers (Wales and Doye, 1997; Kraft, 1988) Numerical solver can choose not to answer question 35 Translate literals to numeric equations
Dataset Training questions (67 questions, 121 sentences) Seo et al., 2014 High school geometry questions Test questions (119 questions, 215 sentences) We collected them SAT (US college entrance exam) geometry questions We manually annotated the text parse of all questions Dataset is publicly available at: geometry.allenai.org 36
Experiment 1: answering questions SAT Score (%) *** 0.25 penalty for incorrect answer 37
Experiment 1: answering questions SAT Score (%) *** 0.25 penalty for incorrect answer 38
Experiment 1: answering questions SAT Score (%) *** 0.25 penalty for incorrect answer 39
Experiment 1: answering questions SAT Score (%) *** 0.25 penalty for incorrect answer 40
Experiment 2: Improving dependency parsing E C BD A “BD is perpendicular to AC at point E.” Obtain top-k dependency parses, and re-rank them based on GeoS result 41
Demo (geometry.allenai.org/demo) 42
Oracle Studies 43
Oracle Studies 44
Oracle Studies 45
Oracle Studies 46
Failure Modes 47
Summary First end-to-end system for solving high school geometry problems Achieved 55% on official and practice SAT geometry questions Text parsing in the presence of diagram 48
Future work Expand text parsing algorithm to other grounded language acquisition domains Improvements in solving geometry problems: Increase data size Weakly-supervised learning More interaction between text and diagram More transparent solver Numerical solver is black box Logical solver: gives more feedback from the solution 49
Thank you! For more information, please visit: geometry.allenai.org 50
Two-stage parsing Natural Language Formal LanguageIntermediate Representation Easy Hard! 51 (Kwiatkowski et al., 2013) “Circle O has radius of 5” Equals(RadiusOf(O), 5) Bridged(RadiusOf(O), 5)
Two-stage parsing: examples Natural language “Circle O has a radius of 5.” IntermediateBridged(RadiusOf(O), 5) Formal language Equals(RadiusOf(O), 5) Natural language“AM and CM bisect BAC and BCA.” Intermediate Bisects(AM, BAC) ∧ CC(AM, CM) ∧ CC(BAC, BCA) Formal language Bisects(AM,BAC) ∧ Bisects(CM, BCA) 52
Affinity score function Each literal has text score and diagram score Affinity score is the sum of text and diagram scores of literals 53
Coherence score function “DE is parallel with AB, and EF equals 5.” Parallel(DE, EF) Equals(AB, 5) Equals(EF, 5) Parallel(DE, AB) Parallel(DE, EF) Equals(AB, 5) Parallel(DE, AB) Equals(EF, 5) High coverage, high redundancy Low coverage, low redundancy High coverage, low redundancy coverage redundancy 54
Numerical solver AB is perpendicular to BC, AB = 3 and BC = 4. What is the length of AB? a) 3 b) 4 c) 5 d) 6 e) 7 Perpendicular(AB, BC) Equals(LengthOf(AB), 3) Equals(LengthOf(BC, 4) Equals(LengthOf(AC), What) A BC 2 variables for each point (x, y) 1 variable for unknown ( What ) One equation for each literal Simultaneously satisfy 4 equations with 7 variables (3 variables are free due to translation and rotation) 55
Experiment 2: semantic parsing 56