Presentation is loading. Please wait.

Presentation is loading. Please wait.

UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.

Similar presentations


Presentation on theme: "UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014."— Presentation transcript:

1 UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014

2 Outline UAM CorpusTool (O’Donnell, 2008)  Tool description  A short tutorial Annotating signals of coherence relations by UAM CorpusTool Feb 5, 2014 Discourse Research Group 2

3 UAM CorpusTool Created by Mick O’Donnell in 2008 Replaces prior software Systemic Coder which allowed coding of single documents at a single layer Available at http://www.wagsoft.com/CorpusTool/ Runs on Windows and Mac OS “… primarily aimed at the linguist or computational linguist who does not program, and would rather spend their time annotating text than learning how to use the system.” (O’Donnell, 2008: 13) Feb 5, 2014 Discourse Research Group 3

4 UAM CorpusTool Annotate documents  text type, writer characteristics, register, etc. Annotate segments  Tagging sections of a text by function (abstract, introduction, body, conclusion)  Tagging sentences (active/passive; simple/ complex) or clauses (relative/imperative/non-finite)  Semantic or pragmatic annotation (synonymy/antonymy; speech acts)  Tagging POS (noun, verbs, adjective) Automatic grammar analysis (English only) using Stanford parser Rhetorical structure annotation Feb 5, 2014 Discourse Research Group 4

5 Annotation in UAM CorpusTool Main Steps  Start a new project  Add (an) annotation layer(s) You can use some pre-built annotation schemes or design your own  Add file Import.txt files and Incorporate them  Annotate Feb 5, 2014 Discourse Research Group 5

6 Annotation in UAM CorpusTool Main Window Screenshot Feb 5, 2014 Discourse Research Group 6

7 Annotation in UAM CorpusTool Annotation Scheme Screenshots Feb 5, 2014 Discourse Research Group 7

8 Annotation in UAM CorpusTool Document Coding Screenshot Feb 5, 2014 Discourse Research Group 8

9 Annotation in UAM CorpusTool Segment Coding Screenshot Feb 5, 2014 Discourse Research Group 9

10 Other Components Search Autocode Statistics Explore Options Help Feb 5, 2014 Discourse Research Group 10

11 Annotating Signals of Coherence Relations Goal  Annotate signals of coherence relations Signals of coherence relations  E.g., John is tall, but Mary is short.  One straightforward signal: the discourse marker ‘but’  Also, there are two more signals Antonyms (tall ~ short) Parallel syntactic constructions (subj – copula – adj) Feb 5, 2014 Discourse Research Group 11

12 Annotating Signals of Coherence Relations  Annotate the RST Discourse Treebank (Carlson et al., 2002) Contains 385 documents from The Wall Street Journal articles Texts in those articles are annotated already for rhetorical (coherence) relations Approx. 22,000 discourse units and 17,000 relations in total Feb 5, 2014 Discourse Research Group 12

13 Annotating Signals of Coherence Relations  Requirements from an annotation tool Importability  Relevant data to be imported into the tool Annotation Scheme  Support for three-level hierarchical taxonomy Customizability  Easy access to the annotation scheme for editing Multiple Annotations  Two or more tags for a single element Convertibility  XML output Simplicity  No advanced computational knowledge  Graphical interface Feb 5, 2014 Discourse Research Group 13

14 Signalling Annotation by UAM CorpusTool  Problem with Importing data UAM CorpusTool supports RST annotation and can directly import RST files However, it cannot provide layered annotation on top of the RST-level structure  Solution to the problem Convert RST base files from LISP to text format Import the converted files This retains discourse structures and all relational information Feb 5, 2014 Discourse Research Group 14

15 Signalling Annotation by UAM CorpusTool  How did we do the rest? Feb 5, 2014 Discourse Research Group 15

16 Signalling Annotation by UAM CorpusTool  Annotation Scheme Screenshot Feb 5, 2014 Discourse Research Group 16

17 Signalling Annotation by UAM CorpusTool  Annotation Window Screenshot Feb 5, 2014 Discourse Research Group 17

18 References Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST Discourse Treebank, LDC2002T07 [Corpus]. Philadelphia, PA: Linguistic Data Consortium. O'Donnell, M. (2008). The UAM CorpusTool: Software for corpus annotation and exploration. Paper presented at the XXVI Congreso de AESLA, Almeria, Spain. Feb 5, 2014 Discourse Research Group 18

19 Thank You! Feb 5, 2014 Discourse Research Group 19


Download ppt "UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014."

Similar presentations


Ads by Google