Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genre Distinctions in the Penn TreeBank Bonnie Webber (2009) Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP: 674-682.

Similar presentations


Presentation on theme: "Genre Distinctions in the Penn TreeBank Bonnie Webber (2009) Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP: 674-682."— Presentation transcript:

1 Genre Distinctions in the Penn TreeBank Bonnie Webber (2009) Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP: 674-682 Presented by Todd Shore tshore@coli.uni-sb.de Project Seminar: Language Processing for Different Domains and Genres Caroline Sporleder & Ines Rehbein Saarland University

2 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 2 Overview ● Intro: Genre and Language – Two Perspectives on Genre – Genres: Distinguishing Features ● Genres in the PTB/WSJ ● Consequences for Parsing – Discourse Relations – Genre and DR Type Frequency – Statistical Ramifications

3 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 3 Intro: Genre and Language ● Genre distinction methods: – Theme: Fantasy, thriller, sci-fi... – Medium: Novel, short story, letter... – Classical Greek (800 B.C.): Drama, poetry, prose

4 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 4 Intro: Two Perspectives on Genre ● Exogenous: Communicative purpose of a text is used for categorisation (Swales 1990) ● Endogenous: Texts are grouped into “genres” by common features they share with others in the genre (Webber 2009) ● Kessler et al. (1997) tried to combine these two criteria, stating that sharing common features is by itself insufficient to license the existence of a particular genre

5 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 5 Genres: Distinguishing Features Example: What does each text (endogenously) feature that distinguishes it from the other? genre1.txtgenre2.txt Narrative essay Financial report Features: ● Generalising statives ● Events Features: ● Events

6 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 6 Genres: Within the PTB/WSJ Bonnie Webber (2009: p675) Penn TreeBank (PTB) is based on Wall Street Journal (WSJ) – one “literary” genre (“news”)? ● “Op-Ed pieces and reviews ending with a byline” ● “Essays on topics commemorating the WSJ’s centennial” (e.g. genre1.txt –wsj0676) ● “Daily summaries of financially significant events, ending with a summary of the day’s market figures” (e.g. genre2.txt – wsj2420) ● “Summaries of recent SEC filings” ● “Weekly market summaries” ● “Letters to the editor” ● “Corrections” ● “Wit and short verse” 13 subgenres total (Carlson et al. 2002)!

7 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 7 Consequences for Parsing Schröder goes to the sauna with Putin. Afterwards, he starts to work for Gazprom. Temporal relation Merkel only goes to dinner with him. Comparison relation ● Simple present tense may denote a single action or a habit; disambiguation thereof requires additional contextual information ● In order to parse the (correct) situation entity, sentential relational link(s) must be present Discourse Relations

8 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 8 Consequences for Parsing Discourse Relations: Dimensions of Connectives Explicitness: ● Explicit Schröder goes to the sauna with Putin. However, Merkel only goes to dinner with him. ● Implicit Schröder goes to the sauna with Putin. Merkel only goes to dinner with him. (Legend: ARG1 connective ARG2)

9 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 9 Consequences for Parsing Discourse Relations: Dimensions of Connectives Location: ● Inter-sentential Schröder goes to the sauna with Putin. However, Merkel only goes to dinner with him. ● Intra-sentential Despite that Schröder goes to the sauna with Putin, Merkel only goes to dinner with him. (Legend: ARG1 connective ARG2)

10 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 10 Consequences for Parsing Discourse Relations: Dimensions of Connectives Sense: ● Temporal Schröder goes to the sauna with Putin. Afterwards, he starts to work for Gazprom. ● Contingency Because Schröder goes to the sauna with Putin, He starts to work for Gazprom next week. (Legend: ARG1 connective ARG2) ● Comparison Schröder goes to the sauna with Putin. However, Merkel only goes to dinner with him. ● Expansion Schröder goes to the sauna with Putin. Likewise, Merkel goes to dinner with him.

11 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 11 Consequences for Parsing Genre and Discourse Relation Type Frequency Webber (2009: p679): “Distribution of Explicit Inter-Sentential Connectives” Even inside the WSJ, there is a distinction between (sub-)genres in the varying frequencies of different discourse relations

12 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 12 Consequences for Parsing Statistical Ramifications ● The sense motivated by a connective is often ambiguous ● The type frequency of (all) connectives is highly genre-sensitive Webber (2009: p680): “Implicit Connectives”

13 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 13 Conclusion ● Distinct linguistic genres display different distributions of discourse relations ● Discourse relations are often ambiguous without contextual information ● In theory: Genre is a factor affecting the probability of a given connective ● In application: In order to (computationally) parse discourse (relations) correctly, genre must be incorporated into one's model

14 29/09/2016Todd Shore: Genre Distinctions in the Penn TreeBank (Bonnie Webber 2009) 14 ● PDTB API (Web start): http://www.seas.upenn.edu/~pdtb/PDTBAPI/index.html#browser http://www.seas.upenn.edu/~pdtb/PDTBAPI/index.html#browser ● Local PDTB Resources (on cluster machines/“cat”): – Rawroot = /proj/corpora/penntreebank/2.0/raw/wsj – Ptbroot = /proj/corpora/penntreebank/2.0/combined/wsj – Pdtbroot = /proj/corpora/penn_discourse_treebank/pdtb_v2/data Resources Works cited ● Kessler, Brett; Numberg, Geoffrey & Schütze, Hinrich (1997). ' Automatic detection of text genre'. Proceedings of the 35th Annual Meeting of the ACL: 32–38. ● Penn Discourse TreeBank. Accessed 11 Nov 2009. ● Swales, John (1990) Genre Analysis. Cambridge: Cambridge University Press. ● Webber, Bonnie (2009) 'Genre Distinctions in the Penn TreeBank'. Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP: 674-682


Download ppt "Genre Distinctions in the Penn TreeBank Bonnie Webber (2009) Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP: 674-682."

Similar presentations


Ads by Google