Jan Christoph Meister University of Hamburg www.catma.de.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Music Encoding Initiative (MEI) DTD and the OCVE
Enterprise Content Management Departmental Solutions Enterprisewide Document/Content Management at half the cost of competitive systems ImageSite is:
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Extensible Markup Language XML MIS 520 – Database Theory Fall 2001 (Day) Lecture 14.
Introduction to Databases
BUSINESS DRIVEN TECHNOLOGY
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
C++ fundamentals.
Using Microsoft SharePoint to Develop Workflow and Business Process Automation Ted Perrotte National Practice Manager, Quilogy, Microsoft Office SharePoint.
Process-oriented System Automation Executable Process Modeling & Process Automation.
Introduction to Databases and Database Languages
Introduction to BIM BIM Curriculum 01.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
.NET Framework Introduction: Metadata
Christopher Jeffers August 2012
Interoperability Scenario Producing summary versions of compound multimedia historical documents.
Automating Tasks with Visual Basic. Introduction  When can’t find a readymade macro action that does the job you want, you can use Visual Basic code.
1 CSBP430 – Database Systems Chapter 1: Databases and Database Users Mamoun Awad College of Information Technology United Arab Emirates University
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Database System Concepts and Architecture
Introduction: Databases and Database Users
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
02 | Install and Configure Team Foundation Server Anthony Borton | ALM Consultant, Enhance ALM Steven Borg | Co-founder & Strategist, Northwest Cadence.
Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
MinorThird 서울시립대학교 인공지능연구실 곽별샘
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Language and Computation Day University of Essex 4 October 2005.
Tutorial 11 Five windows included in the Visual Basic Startup Screen Main Form Toolbox Project Explorer (Project) Properties.
Chapter(1) Introduction and conceptual modeling. Basic definitions Data : know facts that can be recorded and have an implicit. Database: a collection.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
Office Business Applications Workshop Defining Business Process and Workflows.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
This material was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Web Technologies Lecture 2 HTML and CSS. HTML Hyper Text Markup Language – Describes web documents – Made up of nested HTML markup tags – Tags are the.
JDF – An Overview.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Robert Aydelotte ExxonMobil - Upstream Technical Computing 13 May 2004 Standardizing Fluid Property Reporting.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Facilitating Document Annotation Using Content and Querying Value.
Viewpoint Modeling and Model-Based Media Generation for Systems Engineers Automatic View and Document Generation for Scalable Model- Based Engineering.
BRAT: a web based tool for manual annotation Hans Paulussen ITEC, KU Leuven KULAK.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
The Workhorse System ● Andrew J. Dougherty ● FRDCSA Project.
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
System Programming and administration
Overview: Fedora Architecture and Software Features
Data Model.
Meta-Data: the key to accessing Data and Information
Presentation transcript:

Jan Christoph Meister University of Hamburg

CATMA - an integrated textual markup and analysis tool CLARIN's Turn Towards The Literary Text

Text vs. sentence, or: What‘s so different about processing texts? structural complexity: min TEXT > 2 (SENTENCE) structural activity: TEXT processing actualizes paradigmatic cross-reference across sentences structural dynamic: TEXT processing represents & simulates cognitive and empirical processes CLARIN's Turn Towards The Literary Text3 TEXT yields more INTERPRETATIONS than SENTENCE +CONTINGENCY: The more complex & dynamic structure, when activated during processing, results in a higher degree of contingency in functional „outcome“

The what and why of MarkUp  procedural, descriptive & discursive function discursive markup: enables human readers to interpret a text and to explore its hermeneutic potential in collaboration  „What might this text mean to us?“ declarative markup: informs a human reader how to process a text as a communicative device  „How is this text put together and how does it function in its communicative universe?“ procedural markup: instructs a (natural or artificial) text processor how to handle a text as a structured character string  „What is the correct operation to perfom on this input?“ CLARIN's Turn Towards The Literary Text performative function discursive function

Hermeneutic „must haves“ of discursive markup facilitate collaboration & non-deterministic annotation allow for multiple markup allow for overlap allow for concurrent tagging conceptualize markup as dynamic & recursive allow for extensibility allow for multiple (and even contradictory) markup seamlessly integrate markup and analysis & support the hermeneutic loop CLARIN's Turn Towards The Literary Text

MarkUp types & data models CLARIN's Turn Towards The Literary Text 6 There is no such thing as “no-mark up”. (Coombs, Renear, DeRose 1987) opaqueimplicit There is no such thing as “no-mark up.” linear inline, deterministic There is no such thing as “no-mark up”. nested inline, deterministic sequential There is no such thing as ”no-mark up”. relational stand off, descriptive There is no such thing as “no-mark up”. network stand off, discursive

Implementation in CATMA CLARIN's Turn Towards The Literary Text

The CATMA/CLÉA approach to markup text range based model  a tag references a text range with a start and an end offset external standoff markup  markup is stored in external files or data bases to facilitate tagging and exchange of markup by multiple users  markup is stored in a standoff manner to allow overlapping  markup tolerates non-deterministic tagging & supports analytical operations that exploit semantic ambiguity CLARIN's Turn Towards The Literary Text

Example for overlapping markup in CATMA CLARIN's Turn Towards The Literary Text 9 (NB: In CATMA tag sets can be imported/exported; tags can be created / manipulated ad hoc during mark up)

TEI feature structure tag declaration & overlapping markup Keynote_speaker&affiliation CLARIN's Turn Towards The Literary Text 10

Question 1: How can we model a collaborative mark up practice? CLARIN's Turn Towards The Literary Text 11

Answer 1: CATMA’S “n-meta-data set to-1 object data instance”-model CLARIN's Turn Towards The Literary Text TEXT 0 A user markup 1..n meta-data procedural declarative hermeneutic object-data Tagsets

Question 2: But how, on top of that, can we also model the recursive routines that characterize the humanistic workflow? CLARIN's Turn Towards The Literary Text 13 TEXT

Example for recursion: a simple querie across the object data/meta data divide CLARIN's Turn Towards The Literary Text 14 Step 1: object data querie Step 2: refinement by adding an additional meta-data constraint

... which is why (reg="\b\S*\Qez\E(?=\W)") where (tag="Keynote_speaker&affiliation") generates this: CLARIN's Turn Towards The Literary Text 15

Answer 2: CATMA’S dynamic data model, e.g. (n meta-data set to 1 object instance) >n CLARIN's Turn Towards The Literary Text TEXT 0 A markup 1..n meta-data procedural declarative hermeneutic object-data TEXT 0 A markup 1..n object-data Tagsets

Question 3: How can we implement this practice in a system? CLARIN's Turn Towards The Literary Text 17

Answer 3: Call the big sister – CLÉA! CLARIN's Turn Towards The Literary Text18 CLÉA Data Base Model

CATMA/CLÉA: User and resource administration CLARIN's Turn Towards The Literary Text19

Manage corpora & source documents, markup collections and tag libraries CLARIN's Turn Towards The Literary Text20

Annotate texts or corpora using pre-defined or ready-made tags CLARIN's Turn Towards The Literary Text21

Build and execute queries on source text & tags, or any combination thereof CLARIN's Turn Towards The Literary Text22

Visualize results CLARIN's Turn Towards The Literary Text23

What’s in it for CLARIN? Import any text or corpus into CATMA/CLÉA Run standard analytical procedures automatically or inter actively on upload (indexing, POS tagging etc.) Annotate and analyse texts or corpora collaboratively Share and export markup from the CATMA/CLÉA data base in multiple formats CLÉA = Collaborative Literature Éxploration and Annotation CLARIN's Turn Towards The Literary Text 24

CLARIN's Turn Towards The Literary Text 25 Mille grazie to my CATMA/CLÉA development team Evelyn Gius Malte Meister Marco Petris Lena Schüch and to our funders University of Hamburg (2009) Google DH Awards ( ) BMBF ( )

Tag definition each Tag can have additional user defined properties each Tag has a type each Tag has a color CLARIN's Turn Towards The Literary Text

Tag instance a Tag instance can have individual values for the user defined properties each Tag instance is of a type CLARIN's Turn Towards The Literary Text

Tag referencing The content of a range is referenced by a pointer to an external entity. The URI is based on the RFC 5147 for pointing to plain text CLARIN's Turn Towards The Literary Text

Potential problems and possible solutions referencing ranges based on character offsets are vulnerable to modifications of the content possible solution: automated adjustments with checksums and context information, and track versioning and revision history in the source document header the encoding of the tags is machine readable but not interoperable out of the box possible solution: defining the feature structure encoding of tags in terms of the open annotation framework CLARIN's Turn Towards The Literary Text