Incremental Text Structuring with Hierarchical Ranking Erdong Chen Benjamin Snyder Regina Barzilay.

Slides:

Advertisements

Similar presentations

Classification Classification Examples

Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Data Mining Classification: Alternative Techniques

Multiple Aspect Ranking using the Good Grief Algorithm Benjamin Snyder and Regina Barzilay at MIT Elizabeth Kierstead.

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Ensemble Learning: An Introduction

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.

English 142 Section 03 Group 2 Barack Obama.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Distributed Representations of Sentences and Documents

Scalable Text Mining with Sparse Generative Models

Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter Luke Zettlemoyer Mausam Oren Etzioni 1.

Query session guided multidocument summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Ensembles of Classifiers Evgueni Smirnov

Machine Learning CS 165B Spring 2012

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Estimating Importance Features for Fact Mining (With a Case Study in Biography Mining) Sisay Fissaha Adafre School of Computing Dublin City University.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.

Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

BIOTECA: Restructuring Wikipedia Jenny Yuen Serdar Balci Erdong Chen Alvin Raj.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.

1 Towards Automated Related Work Summarization (ReWoS) HOANG Cong Duy Vu 03/12/2010.

MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Barack Obama Rita Nijnekov Presidential Inauguration.

9.0 New Features Min. Life for a Titanium Turbine Blade Workshop 9 Robust Design – DesignXplorer.

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

Algorithmic Detection of Semantic Similarity WWW 2005.

DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Post-Ranking query suggestion by diversifying search Chao Wang.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.

WikiSimple – Automatic Simplification of Wikipedia Articles By Kristian Woodsend and Mirella Lapata Presented by Kira Belkin 05/

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Recursive Neural Networks

Mastering the game of Go with deep neural network and tree search

Perceptrons Lirong Xia.

CIS 700 Advanced Machine Learning Structured Machine Learning: Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.

CS Fall 2016 (Shavlik©), Lecture 2

Machine Learning Interpretability

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Modeling IDS using hybrid intelligent systems

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Perceptrons Lirong Xia.

Presentation transcript:

Incremental Text Structuring with Hierarchical Ranking Erdong Chen Benjamin Snyder Regina Barzilay

Incremental Text Structuring Traditional approach: batch-mode generation  Text is viewed as one-time creation Alternative: incremental generation  Newsfeeds  Wikipedia 3.8 Million Edits per Month 38 Edits per Article June 28, 20072

Barack Obama (Wikipedia Article) June 28, Barack Obama is a Democratic politician from Illinois. He is currently running for the United States Senate, which would be the highest elected office he has held thus far. Biography Obama's father is Kenyan; his mother is from Kansas. He himself was born in Hawaii, where his mother and father met at the University of Hawaii. Obama's father left his family early on, and Obama was raised in Hawaii by his mother. Created in 2004 (5 sentences)

5907 revisions up to 2007 (>400 sentences) Barack Obama (Wikipedia Article) June 28, 20074

Generation Architecture Content Selection Structuring Surface Realization June 28, 20075

Generation Architecture Content Selection Structuring Surface Realization June 28, Our focus

Task Definition Input: Output: insertion point June 28, 20077

Task Definition Input:  text is organized hierarchically Output: insertion point June 28, 20078

Sample Insertion June 28, He received his B.A. degree in 1983, then worked for one year at Business International Corporation. In 1985, Obama moved to Chicago to direct a non-profit project assisting local churches to organize job training programs. In 1990, The New York Times reported his election as the Harvard Law Review's "first black president in its 104-year history.“ He entered Harvard Law School in 1988.

Sample Features Topical Features  Word overlap with section  Word overlap with paragraph Positional Features  Last paragraph of article or not?  First section of article or not? Temporal Features  Temporal order within paragraph June 28,

Sample Features Topical Features  Word overlap with section  Word overlap with paragraph Positional Features  Last paragraph of article or not?  First section of article or not? Temporal Features  Temporal order within paragraph June 28, Red: Section feature Blue: Paragraph feature

Motivation for Hierarchical Model June 28, Paragraph error Section error Goal: Model should be sensitive to type of error If a section has been predicted wrongly, then errors in the paragraph should not be taken into account.

Hierarchical Decomposition of Features June 28, s-Insertion sentence -Local feature vector -Aggregate feature vector -Insertion point -Root path Paragraph Feature Section Feature

Hierarchical Ranking Model: Decoding June 28, Predicted solutionW-Feature weight Paragraph Feature Score Section Feature Score

Hierarchical Ranking Model: Training June 28, Only update weights at the first divergent (green) layer a b

16 Flat Training vs. Hierarchical Training Φ-Aggregate feature vector -Reference solution -Predicted solution -Local feature vector a-Highest divergent node of reference solution b-Highest divergent node of predicted solution Flat: Hierarchical: June 28, 2007

Previous Work on Text Structuring Corpus-based Approach (Lapata, 2003; Karamanis et al., 2004; Okazaki et al., 2004; Barzilay and Lapata, 2005; Bollegala et al., 2006; Elsner and Charniak, 2007)  Focus on relatively short texts  Based on flat text representation Symbolic Approach (McKeown, 1985; Kasper, 1989; Reiter and Dale, 1990; Hovy, 1993; Maier and Hovy, 1993; Moore and Paris, 1993; Reiter and Dale, 1997)  Hand-crafted sentence planner  Based on tree-like text representation June 28,

Previous Work on Hierarchical Learning Hierarchical Classification (Cai and Hofmann, 2004; Dekel et al., 2004; Cesa-Bianchi et al., 2006a; Cesa-Bianchi et al., 2006b)  Input: a flat feature vector  Output: labels from a fixed label hierarchy  Model Parameter: a different weight vector for each label node Hierarchical Ranking (Our method)  Input: a hierarchy with a fixed depth  Output: a leaf node within the hierarchy  Model Parameter: a single weight vector June 28,

Experimental Set-Up Task: sentence Insertion Domain: biography; “Living People” category from Wikipedia Gold Standard: insertion positions from update log of Wikipedia entries Evaluation Measure:  Section accuracy  Paragraph accuracy  Tree Distance June 28,

Corpus Corpus: 4051 sentence/article pairs  Training set: 3240 pairs (80%)  Test set: 811 pairs (20%) Corpus Statistics  Average # of sentences: 32.9  Average # of sections: 3.1  Average # of paragraphs: 10.9 June 28,

Human Evaluation June 28, Randomly selected 80 sentence/article pairs Four judges, each judge took 40 pairs, and every sentence/article pair is assigned to two judges Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) Avg accuracy Mutual Agree

Baselines Straw baselines  RandomIns: pick up a random paragraph of an article  FirstIns: pick the first paragraph of an article  LastIns: pick the last paragraph of an article Pipeline  Training: train two rankers for section selection and paragraph selection separately  Decoding: first choose the best section, and then choose the best paragraph within the chosen section Flat  Training: flat training  Decoding: find the best path by aggregate score June 28,

Results June 28, * Diacritic indicates whether differences in accuracy between the given model and Hierarchy is significant. Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human

Results June 28, LastIns outperforms RandomIns and FirstIns. Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human

Results June 28, Hierarchy outperforms all baselines Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human

Results June 28, At paragraph-level, the gap between Machine and Human is reduced by 32%. Section Acc (%) Paragraph Acc (%) Tree Dist (# of edges) RandomIns31.8*13.4*3.10* FirstIns25.0*13.6*3.23* LastIns30.6*21.5*3.00* Pipeline *2.18* Flat *2.18* Hierarchy Human

Sentence-level Evaluation Local model (Lapata, 2003; Bollegala et al., 2006)  Input: a sequence of sentences  Output: find the best point by examining the two surrounding sentences of each insertion point  Method: Standard Ranking Perceptron (Collins, 2002)  Features: Lexical, Positional, and Temporal June 28,

Sentence-level Evaluation Results Linear baseline:  Use Local model to locate the sentence by simply treating an article as a sequence of sentences  Accuracy: 24% Hierarchical method  Step 1: Use Hierarchy to find best paragraph to place the sentence  Step 2: Use Local model to locate the exact position within the chosen paragraph  Accuracy: 35% June 28,

Conclusions & Future work Conclusions  Incremental text structuring presents a new perspective on text generation  Hierarchical representation coupled with hierarchically sensitive training improves performance Future work  Automatic update of Wikipedia web pages  Combining structure induction with text structuring Code & Data:  June 28,