Incremental Text Structuring with Hierarchical Ranking Erdong Chen Benjamin Snyder Regina Barzilay
Incremental Text Structuring Traditional approach: batch-mode generation Text is viewed as one-time creation Alternative: incremental generation Newsfeeds Wikipedia 3.8 Million Edits per Month 38 Edits per Article June 28, 20072
Barack Obama (Wikipedia Article) June 28, Barack Obama is a Democratic politician from Illinois. He is currently running for the United States Senate, which would be the highest elected office he has held thus far. Biography Obama's father is Kenyan; his mother is from Kansas. He himself was born in Hawaii, where his mother and father met at the University of Hawaii. Obama's father left his family early on, and Obama was raised in Hawaii by his mother. Created in 2004 (5 sentences)
5907 revisions up to 2007 (>400 sentences) Barack Obama (Wikipedia Article) June 28, 20074
Generation Architecture Content Selection Structuring Surface Realization June 28, 20075
Generation Architecture Content Selection Structuring Surface Realization June 28, Our focus
Task Definition Input: Output: insertion point June 28, 20077
Task Definition Input: text is organized hierarchically Output: insertion point June 28, 20078
Sample Insertion June 28, He received his B.A. degree in 1983, then worked for one year at Business International Corporation. In 1985, Obama moved to Chicago to direct a non-profit project assisting local churches to organize job training programs. In 1990, The New York Times reported his election as the Harvard Law Review's "first black president in its 104-year history.“ He entered Harvard Law School in 1988.
Sample Features Topical Features Word overlap with section Word overlap with paragraph Positional Features Last paragraph of article or not? First section of article or not? Temporal Features Temporal order within paragraph June 28,
Sample Features Topical Features Word overlap with section Word overlap with paragraph Positional Features Last paragraph of article or not? First section of article or not? Temporal Features Temporal order within paragraph June 28, Red: Section feature Blue: Paragraph feature
Motivation for Hierarchical Model June 28, Paragraph error Section error Goal: Model should be sensitive to type of error If a section has been predicted wrongly, then errors in the paragraph should not be taken into account.
Hierarchical Decomposition of Features June 28, s-Insertion sentence -Local feature vector -Aggregate feature vector -Insertion point -Root path Paragraph Feature Section Feature
Hierarchical Ranking Model: Decoding June 28, Predicted solutionW-Feature weight Paragraph Feature Score Section Feature Score
Hierarchical Ranking Model: Training June 28, Only update weights at the first divergent (green) layer a b
16 Flat Training vs. Hierarchical Training Φ-Aggregate feature vector -Reference solution -Predicted solution -Local feature vector a-Highest divergent node of reference solution b-Highest divergent node of predicted solution Flat: Hierarchical: June 28, 2007
Previous Work on Text Structuring Corpus-based Approach (Lapata, 2003; Karamanis et al., 2004; Okazaki et al., 2004; Barzilay and Lapata, 2005; Bollegala et al., 2006; Elsner and Charniak, 2007) Focus on relatively short texts Based on flat text representation Symbolic Approach (McKeown, 1985; Kasper, 1989; Reiter and Dale, 1990; Hovy, 1993; Maier and Hovy, 1993; Moore and Paris, 1993; Reiter and Dale, 1997) Hand-crafted sentence planner Based on tree-like text representation June 28,
Previous Work on Hierarchical Learning Hierarchical Classification (Cai and Hofmann, 2004; Dekel et al., 2004; Cesa-Bianchi et al., 2006a; Cesa-Bianchi et al., 2006b) Input: a flat feature vector Output: labels from a fixed label hierarchy Model Parameter: a different weight vector for each label node Hierarchical Ranking (Our method) Input: a hierarchy with a fixed depth Output: a leaf node within the hierarchy Model Parameter: a single weight vector June 28,
Experimental Set-Up Task: sentence Insertion Domain: biography; “Living People” category from Wikipedia Gold Standard: insertion positions from update log of Wikipedia entries Evaluation Measure: Section accuracy Paragraph accuracy Tree Distance June 28,
Corpus Corpus: 4051 sentence/article pairs Training set: 3240 pairs (80%) Test set: 811 pairs (20%) Corpus Statistics Average # of sentences: 32.9 Average # of sections: 3.1 Average # of paragraphs: 10.9 June 28,
Human Evaluation June 28, Randomly selected 80 sentence/article pairs Four judges, each judge took 40 pairs, and every sentence/article pair is assigned to two judges Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) Avg accuracy Mutual Agree
Baselines Straw baselines RandomIns: pick up a random paragraph of an article FirstIns: pick the first paragraph of an article LastIns: pick the last paragraph of an article Pipeline Training: train two rankers for section selection and paragraph selection separately Decoding: first choose the best section, and then choose the best paragraph within the chosen section Flat Training: flat training Decoding: find the best path by aggregate score June 28,
Results June 28, * Diacritic indicates whether differences in accuracy between the given model and Hierarchy is significant. Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human
Results June 28, LastIns outperforms RandomIns and FirstIns. Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human
Results June 28, Hierarchy outperforms all baselines Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human
Results June 28, At paragraph-level, the gap between Machine and Human is reduced by 32%. Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human
Sentence-level Evaluation Local model (Lapata, 2003; Bollegala et al., 2006) Input: a sequence of sentences Output: find the best point by examining the two surrounding sentences of each insertion point Method: Standard Ranking Perceptron (Collins, 2002) Features: Lexical, Positional, and Temporal June 28,
Sentence-level Evaluation Results Linear baseline: Use Local model to locate the sentence by simply treating an article as a sequence of sentences Accuracy: 24% Hierarchical method Step 1: Use Hierarchy to find best paragraph to place the sentence Step 2: Use Local model to locate the exact position within the chosen paragraph Accuracy: 35% June 28,
Conclusions & Future work Conclusions Incremental text structuring presents a new perspective on text generation Hierarchical representation coupled with hierarchically sensitive training improves performance Future work Automatic update of Wikipedia web pages Combining structure induction with text structuring Code & Data: June 28,