 Motivation & Previous Work  Sentence Compression Approach  Linguistically-motivated Heuristics  Word Significance  Compression Generation and Selection.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Chapter 4 Syntax.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Linguistics, Morphology, Syntax, Semantics. Definitions And Terminology.

Statistical NLP: Lecture 3

Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.

Parsing Long and Complex Natural Language Sentences

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Evaluating Statistically Generated Phrases University of Melbourne Department of Computer Science and Software Engineering Raymond Wan and Alistair Moffat.

Overview Project Goals –Represent a sentence in a parse tree –Use parses in tree to search another tree containing ontology of project management deliverables.

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

第一章语法层次和基本句子结构内容提要： ◆语法在语言系统中的地位语法在语言系统中的地位 ◆语法的五个层次：词素、词、词组、分句、句子语法的五个层次：词素、词、词组、分句、句子 ◆分句结构和基本句型分句结构和基本句型.

Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

Linguistic Essentials

Linguistics The eleventh week. Chapter 4 Syntax  4.1 Introduction  4.2 Word Classes.

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.

Ameeta Agrawal Nikolay Yakovets 01 Dec …Prime Minister Vladimir V. Putin, the country's paramount leader, cut short a trip to Siberia, returning.

Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Grammar for Parents 20th October 2016 Welcome! Questions are welcome…

Statistical NLP: Lecture 3

David Mareček and Zdeněk Žabokrtský

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

©2004 Pearson Education, Inc., publishing as Longman Publishers.

Linguistic Essentials

Presentation transcript:

 Motivation & Previous Work  Sentence Compression Approach  Linguistically-motivated Heuristics  Word Significance  Compression Generation and Selection  Experiment Results  Conclusions & Future Work

 no Chinese parallel corpus  hard to create a sentence/compression parallel corpus

 An example of system output [Original] 第四种子乔科维奇退赛, 让原以三比六, 六比一, 四比一领先的第二种子纳达尔获胜过关. Fourth seed Djokovic withdrew from the game, and allowed second seed Nadal, who was leading 3-6, 6-1, 4-1, to claim the victory and progress through. [Human] 乔科维奇退赛让纳达尔获胜过关. Djokovic withdrew from the game, and allowed Nadal to claim the victory and progress through. [Approach 1] 乔科维奇退赛. Djokovic withdrew from the game. [Approach 2] 乔科维奇退赛让种子纳达尔获胜过关. Djokovic withdrew from the game, and allowed seed Nadal to claim the victory and progress through.

Parse Tree Trim Dorr 2003 Sentence Scoring Hori 2003 Clarke Clarke Noisy Channel Turner 2005 Knight 2002 Galley 2007 Decision Tree Knight 2002 Nguyen 2004 Large Margin Learning McDonald 2006 Cohn 2007 Cohn 2008 Unsupervised Learning MaxEnt Riezler 2003 Supervised Learning Sentence Compression Headline Generation Japanese Speech Japanese Speech Paraphrasing Corpus Paraphrasing Corpus Non-Corpus-Based

 Parse Tree Trimming(Dorr et al. 2003)  linguistically-motivated heuristics  hand-made rules to remove low content components  iteratively trim until reach desired length  reduce the risk of deleting important information by applying rules in a certain order safe rules (DT, TIME)  more dangerous rules (CONJ)  the most dangerous rules (PP)

 Parse Tree Trimming (Dorr et al. 2003)  Pros: ▪ comparative good performance ▪ retain grammaticality if parsing is correct  Cons: ▪ require considerable linguist’s skill to produce proper rules in a proper order ▪ sensitive to POS and parsing errors ▪ not flexible and capable to preserve informative components

 Sentence Scoring (Hori & Furui 2004)  improved by Clarke & Lapata in 2006  given an input sentence W = w 1, w 2, …, w n  ranking possible compressions  language model + word significance  Score(compressed sentence C) = p1 * Word Significance Score (all words in C) + p2 * Language Model Score (C) + p3 * Subject-Object-Verb Score (all words in C)

 Sentence Scoring (Hori & Furui 2004)  language model  word significance  Pros: ▪ do not rely heavily on training corpus  Cons: ▪ the weighting parameters are experimentally optimized or estimated by a parallel corpus. ▪ use only language model to encourage compression and ensure grammaticality

 Combine  Linguistically-motivated Heuristics ▪ ensure grammaticality ▪ rules are easier to develop, determining only possible low content components instead of selecting specific constituents for removal  Information Significance Scoring ▪ preserve the most important information ▪ enhance the tolerance of POS and parsing errors

 Combined Approach: Heuristics + Information Significance ▪ use heuristic to determine potentially low content constituents ▪ do real deletion according to word significance

 1. take a Chinese Treebank-style parse as input  2. use linguistically-motivated heuristics to determine potentially removable constituents  3. generate a series of candidate compressions by deleting removable nodes based on word significance  4. select the best compressing according to information density

Combined Approach: Heuristics + Information Significance  Used to determine potentially low content constituents  Basic: (same) ▪ parenthetical elements ▪ adverbs except negative ▪ adjectives ▪ DNPs (phrase + “ 的 ”, modifiers of NP) ▪ DVPs (phrase + “ 地 ”, modifiers of VP) ▪ noun coordination phrases  Complex: (more relaxed, general) ▪ verb coordination phrases ▪ relative clauses ▪ appositive clauses ▪ prepositional phrases ▪ all children of NP nodes except the last noun word ▪ sentential coordination

Heuristics-only Approach  Used to remove specific low content constituents  Basic: (same) ▪ parenthetical elements ▪ adverbs except negative ▪ adjectives ▪ DNPs (phrase + “ 的 ”, modifiers of NP) ▪ DVPs (phrase + “ 地 ”, modifiers of VP) ▪ noun coordination phrases  Complex: (more strict, conservative) ▪ all children of NP nodes except temporal nouns and proper nouns and the last noun word ▪ all simple clauses (IP) except the first one in sentential coordination ▪ prepositional phrases except those that may contain location or date information, according to a hand-made list of prepositions

 An example of applying heuristics  *: nodes labeled as removable by combined approach  #: nodes trimmed out by heuristics-only approach ( (IP (NP (*NP (NR 韩国 )) (#*ADJP (JJ 现代 )) (NP (#*NN 汽车 ) (NN 公司 ))) (VP (VC 是 ) (NP (#*DNP (NP (NR 沃尔沃 )) (DEG 的 )) (#*ADJP (JJ 潜在 )) (NP (NN 买家 )))) (PU.))) ( (IP (NP (*NP (NR South Korean )) (#*ADJP (JJ Hyundai)) (NP (#*NN motor) (NN company))) (VP (VC is) (NP (#*DNP (NP (NR Volvo)) (DEG ’s)) (#*ADJP (JJ potential)) (NP (NN buyer)))) (PU.))) POS error

 An example of applying heuristics  *: nodes labeled as removable by combined approach  #: nodes trimmed out by heuristics-only approach ( (IP (NP (*NP (NR 韩国 )) (#*ADJP (JJ 现代 )) (NP (#*NN 汽车 ) (NN 公司 ))) (VP (VC 是 ) (NP (#*DNP (NP (NR 沃尔沃 )) (DEG 的 )) (#*ADJP (JJ 潜在 )) (NP (NN 买家 )))) (PU.))) ( (IP (NP (*NP (NR South Korean )) (#*ADJP (JJ Hyundai)) (NP (#*NN motor) (NN company))) (VP (VC is) (NP (#*DNP (NP (NR Volvo)) (DEG ’s)) (#*ADJP (JJ potential)) (NP (NN buyer)))) (PU.))) trimmed out by heuristic- only approach

 Event-based Word Significance Score  verb or common noun: tf-idf  proper noun: tf-idf + w  0therwise: 0  weighted parsing tree  depend on word itself regardless of POS  overcome some POS errors

 Generate a series of candidate compressions  by repeatedly trimming the weighted parse tree  greedy algorithm  remove one node with the lowest weight and get a candidate compressed sentence  update the weights of all ancestors of the removed node  repeat until no node is removable

 Information Density  used to select the best compression

 Information Density D(s)Sentence 韩国现代汽车公司是沃尔沃的潜在买家. The South Korean Hyundai Motor Company is a potential buyer of Volvo 韩国现代汽车公司是沃尔沃的买家. The South Korean Hyundai Motor Company is a buyer of Volvo 韩国现代公司是沃尔沃的买家. The South Korean Hyundai Company is a buyer of Volvo 韩国公司是沃尔沃的买家. The South Korean company is a buyer of Volvo 公司是沃尔沃的买家. The company is a buyer of Volvo. 0.0 公司是买家. The company is a buyer.

 79 documents from Chinese newswires  the first sentence of each news article  challenging task  headline-like compression  average length : 61.5 characters  often connects two or more self-complete sentences together

 Human evaluation * The combined approach sacrifices grammaticality to reduce the linguistic complexity of the heuristics ** word significance improves the heuristics on informativeness *** with varying length constraints, depending on original sentence length Compression Rate Grammaticality (1 ~ 5) Informativeness (0~100%) Human38.5% % Heuristics54.1% % Heu+Sig52.8%3.854 *68.8% ** Heu+Sig+L ***34.3% %

 compression with good grammar  perform well on most of the cases  perform terribly on about 20 cases out of all 76 ▪ POS or parsing errors ▪ grammatically correct but semantically incorrect  Grammaticality (1 ~ 5) Number of Sentence Informativeness (0~100%) Heuristics > % Heuristics >= Heu+Sig > % Heu+Sig >=

 First attempt in Chinese  heuristics  ensure grammaticality  word significance  control word deletion, balancing sentence length and information loss  Pros:  not rely on parallel corpus  reduce the complexity of composing heuristics  easily extend to other languages or domains  overcome some POS and parsing errors  competitive to a finely-tuned heuristics-only approach

 applications in summarization, headline generation  keyword selection and weighting  language model  parallel corpus in Chinese  statistical, machine learning

A Parse-and-Trim Approach with Information Significance for Chinese Sentence Compression