Download presentation
Presentation is loading. Please wait.
Published byValentine Harrington Modified over 8 years ago
1
Genre Distinctions in the Penn TreeBank Bonnie Webber (2009) Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP: 674-682 Presented by Todd Shore tshore@coli.uni-sb.de Project Seminar: Language Processing for Different Domains and Genres Caroline Sporleder & Ines Rehbein Saarland University
2
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 2 Overview ● Intro: Genre and Language – Two Perspectives on Genre – Genres: Distinguishing Features ● Genres in the PTB/WSJ ● Consequences for Parsing – Discourse Relations – Genre and DR Type Frequency – Statistical Ramifications
3
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 3 Intro: Genre and Language ● Genre distinction methods: – Theme: Fantasy, thriller, sci-fi... – Medium: Novel, short story, letter... – Classical Greek (800 B.C.): Drama, poetry, prose
4
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 4 Intro: Two Perspectives on Genre ● Exogenous: Communicative purpose of a text is used for categorisation (Swales 1990) ● Endogenous: Texts are grouped into “genres” by common features they share with others in the genre (Webber 2009) ● Kessler et al. (1997) tried to combine these two criteria, stating that sharing common features is by itself insufficient to license the existence of a particular genre
5
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 5 Genres: Distinguishing Features Example: What does each text (endogenously) feature that distinguishes it from the other? genre1.txtgenre2.txt Narrative essay Financial report Features: ● Generalising statives ● Events Features: ● Events
6
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 6 Genres: Within the PTB/WSJ Bonnie Webber (2009: p675) Penn TreeBank (PTB) is based on Wall Street Journal (WSJ) – one “literary” genre (“news”)? ● “Op-Ed pieces and reviews ending with a byline” ● “Essays on topics commemorating the WSJ’s centennial” (e.g. genre1.txt –wsj0676) ● “Daily summaries of financially significant events, ending with a summary of the day’s market figures” (e.g. genre2.txt – wsj2420) ● “Summaries of recent SEC filings” ● “Weekly market summaries” ● “Letters to the editor” ● “Corrections” ● “Wit and short verse” 13 subgenres total (Carlson et al. 2002)!
7
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 7 Consequences for Parsing Schröder goes to the sauna with Putin. Afterwards, he starts to work for Gazprom. Temporal relation Merkel only goes to dinner with him. Comparison relation ● Simple present tense may denote a single action or a habit; disambiguation thereof requires additional contextual information ● In order to parse the (correct) situation entity, sentential relational link(s) must be present Discourse Relations
8
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 8 Consequences for Parsing Discourse Relations: Dimensions of Connectives Explicitness: ● Explicit Schröder goes to the sauna with Putin. However, Merkel only goes to dinner with him. ● Implicit Schröder goes to the sauna with Putin. Merkel only goes to dinner with him. (Legend: ARG1 connective ARG2)
9
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 9 Consequences for Parsing Discourse Relations: Dimensions of Connectives Location: ● Inter-sentential Schröder goes to the sauna with Putin. However, Merkel only goes to dinner with him. ● Intra-sentential Despite that Schröder goes to the sauna with Putin, Merkel only goes to dinner with him. (Legend: ARG1 connective ARG2)
10
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 10 Consequences for Parsing Discourse Relations: Dimensions of Connectives Sense: ● Temporal Schröder goes to the sauna with Putin. Afterwards, he starts to work for Gazprom. ● Contingency Because Schröder goes to the sauna with Putin, He starts to work for Gazprom next week. (Legend: ARG1 connective ARG2) ● Comparison Schröder goes to the sauna with Putin. However, Merkel only goes to dinner with him. ● Expansion Schröder goes to the sauna with Putin. Likewise, Merkel goes to dinner with him.
11
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 11 Consequences for Parsing Genre and Discourse Relation Type Frequency Webber (2009: p679): “Distribution of Explicit Inter-Sentential Connectives” Even inside the WSJ, there is a distinction between (sub-)genres in the varying frequencies of different discourse relations
12
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 12 Consequences for Parsing Statistical Ramifications ● The sense motivated by a connective is often ambiguous ● The type frequency of (all) connectives is highly genre-sensitive Webber (2009: p680): “Implicit Connectives”
13
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 13 Conclusion ● Distinct linguistic genres display different distributions of discourse relations ● Discourse relations are often ambiguous without contextual information ● In theory: Genre is a factor affecting the probability of a given connective ● In application: In order to (computationally) parse discourse (relations) correctly, genre must be incorporated into one's model
14
29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 14 ● PDTB API (Web start): http://www.seas.upenn.edu/~pdtb/PDTBAPI/index.html#browser http://www.seas.upenn.edu/~pdtb/PDTBAPI/index.html#browser ● Local PDTB Resources (on cluster machines/“cat”): – Rawroot = /proj/corpora/penntreebank/2.0/raw/wsj – Ptbroot = /proj/corpora/penntreebank/2.0/combined/wsj – Pdtbroot = /proj/corpora/penn_discourse_treebank/pdtb_v2/data Resources Works cited ● Kessler, Brett; Numberg, Geoffrey & Schütze, Hinrich (1997). ' Automatic detection of text genre'. Proceedings of the 35th Annual Meeting of the ACL: 32–38. ● Penn Discourse TreeBank. Accessed 11 Nov 2009. ● Swales, John (1990) Genre Analysis. Cambridge: Cambridge University Press. ● Webber, Bonnie (2009) 'Genre Distinctions in the Penn TreeBank'. Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP: 674-682
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.