Download presentation
Presentation is loading. Please wait.
Published byDorcas Peters Modified over 9 years ago
1
Jan Christoph Meister University of Hamburg www.catma.de
2
CATMA - an integrated textual markup and analysis tool 29.10.2012 2 CLARIN's Turn Towards The Literary Text
3
Text vs. sentence, or: What‘s so different about processing texts? structural complexity: min TEXT > 2 (SENTENCE) structural activity: TEXT processing actualizes paradigmatic cross-reference across sentences structural dynamic: TEXT processing represents & simulates cognitive and empirical processes 29.10.2012CLARIN's Turn Towards The Literary Text3 TEXT yields more INTERPRETATIONS than SENTENCE +CONTINGENCY: The more complex & dynamic structure, when activated during processing, results in a higher degree of contingency in functional „outcome“
4
The what and why of MarkUp procedural, descriptive & discursive function discursive markup: enables human readers to interpret a text and to explore its hermeneutic potential in collaboration „What might this text mean to us?“ declarative markup: informs a human reader how to process a text as a communicative device „How is this text put together and how does it function in its communicative universe?“ procedural markup: instructs a (natural or artificial) text processor how to handle a text as a structured character string „What is the correct operation to perfom on this input?“ 29.10.2012 4 CLARIN's Turn Towards The Literary Text performative function discursive function
5
Hermeneutic „must haves“ of discursive markup facilitate collaboration & non-deterministic annotation allow for multiple markup allow for overlap allow for concurrent tagging conceptualize markup as dynamic & recursive allow for extensibility allow for multiple (and even contradictory) markup seamlessly integrate markup and analysis & support the hermeneutic loop 29.10.2012 5 CLARIN's Turn Towards The Literary Text
6
MarkUp types & data models 29.10.2012CLARIN's Turn Towards The Literary Text 6 There is no such thing as “no-mark up”. (Coombs, Renear, DeRose 1987) opaqueimplicit There is no such thing as “no-mark up.” linear inline, deterministic There is no such thing as “no-mark up”. nested inline, deterministic sequential There is no such thing as ”no-mark up”. relational stand off, descriptive There is no such thing as “no-mark up”. network stand off, discursive
7
Implementation in CATMA 29.10.2012 7 CLARIN's Turn Towards The Literary Text www.catma.de
8
The CATMA/CLÉA approach to markup text range based model a tag references a text range with a start and an end offset external standoff markup markup is stored in external files or data bases to facilitate tagging and exchange of markup by multiple users markup is stored in a standoff manner to allow overlapping markup tolerates non-deterministic tagging & supports analytical operations that exploit semantic ambiguity 29.10.2012 8 CLARIN's Turn Towards The Literary Text
9
Example for overlapping markup in CATMA 29.10.2012CLARIN's Turn Towards The Literary Text 9 (NB: In CATMA tag sets can be imported/exported; tags can be created / manipulated ad hoc during mark up)
10
TEI feature structure tag declaration & overlapping markup Keynote_speaker&affiliation 29.10.2012CLARIN's Turn Towards The Literary Text 10
11
Question 1: How can we model a collaborative mark up practice? 29.10.2012CLARIN's Turn Towards The Literary Text 11
12
Answer 1: CATMA’S “n-meta-data set to-1 object data instance”-model 29.10.2012 12 CLARIN's Turn Towards The Literary Text TEXT 0 A user markup 1..n meta-data procedural declarative hermeneutic object-data Tagsets
13
Question 2: But how, on top of that, can we also model the recursive routines that characterize the humanistic workflow? 29.10.2012CLARIN's Turn Towards The Literary Text 13 TEXT
14
Example for recursion: a simple querie across the object data/meta data divide 29.10.2012CLARIN's Turn Towards The Literary Text 14 Step 1: object data querie Step 2: refinement by adding...... an additional meta-data constraint
15
... which is why (reg="\b\S*\Qez\E(?=\W)") where (tag="Keynote_speaker&affiliation") generates this: 29.10.2012CLARIN's Turn Towards The Literary Text 15
16
Answer 2: CATMA’S dynamic data model, e.g. (n meta-data set to 1 object instance) >n+1 29.10.2012 16 CLARIN's Turn Towards The Literary Text TEXT 0 A markup 1..n meta-data procedural declarative hermeneutic object-data TEXT 0 A markup 1..n object-data Tagsets
17
Question 3: How can we implement this practice in a system? 29.10.2012CLARIN's Turn Towards The Literary Text 17
18
Answer 3: Call the big sister – CLÉA! 29.10.2012CLARIN's Turn Towards The Literary Text18 CLÉA Data Base Model
19
CATMA/CLÉA: User and resource administration 29.10.2012CLARIN's Turn Towards The Literary Text19
20
Manage corpora & source documents, markup collections and tag libraries 29.10.2012CLARIN's Turn Towards The Literary Text20
21
Annotate texts or corpora using pre-defined or ready-made tags 29.10.2012CLARIN's Turn Towards The Literary Text21
22
Build and execute queries on source text & tags, or any combination thereof 29.10.2012CLARIN's Turn Towards The Literary Text22
23
Visualize results 29.10.2012CLARIN's Turn Towards The Literary Text23
24
What’s in it for CLARIN? Import any text or corpus into CATMA/CLÉA Run standard analytical procedures automatically or inter actively on upload (indexing, POS tagging etc.) Annotate and analyse texts or corpora collaboratively Share and export markup from the CATMA/CLÉA data base in multiple formats CLÉA = Collaborative Literature Éxploration and Annotation 29.10.2012CLARIN's Turn Towards The Literary Text 24
25
29.10.2012CLARIN's Turn Towards The Literary Text 25 Mille grazie to my CATMA/CLÉA development team Evelyn Gius Malte Meister Marco Petris Lena Schüch and to our funders University of Hamburg (2009) Google DH Awards (2010-2013) BMBF (2013-2016)
26
Tag definition each Tag can have additional user defined properties each Tag has a type each Tag has a color 29.10.2012 26 CLARIN's Turn Towards The Literary Text
27
Tag instance a Tag instance can have individual values for the user defined properties each Tag instance is of a type 29.10.2012 27 CLARIN's Turn Towards The Literary Text
28
Tag referencing The content of a range is referenced by a pointer to an external entity. The URI is based on the RFC 5147 for pointing to plain text. 29.10.2012 28 CLARIN's Turn Towards The Literary Text
29
Potential problems and possible solutions referencing ranges based on character offsets are vulnerable to modifications of the content possible solution: automated adjustments with checksums and context information, and track versioning and revision history in the source document header the encoding of the tags is machine readable but not interoperable out of the box possible solution: defining the feature structure encoding of tags in terms of the open annotation framework 29.10.2012 29 CLARIN's Turn Towards The Literary Text
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.