Trends in Concept Modelling Turning Issues into Solutions How to Discipline a Cat Sue Ellen Wright, Kent State University.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

ISOcat Data Model: Workflow & Guidelines Marc Kemps-Snijders a, Sue Ellen Wright b, Menzo Windhouwer a a Max Planck Institute for Psycholinguistics, b.
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
Introduction to Machine Learning Approach Lecture 5.
Runway Safety Teams (RSTs) Description and Processes Session 5 Presentation 1.
Software Configuration Management
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Uncovering the Promise of Faculty Success Online Lawrence C. Ragan, Ph.D. Penn State’s World Campus NERCOMP Boston 2005.
Query Health Business Working Group Kick-Off September 8, 2011.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CHAPTER 5 Infrastructure Components PART I. 2 ESGD5125 SEM II 2009/2010 Dr. Samy Abu Naser 2 Learning Objectives: To discuss: The need for SQA procedures.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
EXPECTATIONS OF TURKISH ENVIRONMENTAL SECTOR FROM INSPIRE Ministry of Environment and Forestry June, 2010 Özlem ESENGİN Ahmet ÇİVİ Tuncay DEMİR.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Feasibility Study of a Wiki Collaboration Platform for Systematic Review Eileen Erinoff AHRQ Annual Meeting September 15, 2009.
ITEC224 Database Programming
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Commonwealth of Massachusetts Statewide Strategic IT Consolidation (ITC) Initiative ANF IT Consolidation Website Publishing / IA Working Group Kickoff.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
Sarasota Policy Wiki Why Wiki? To provide a new platform for community input on public policies and issues. To encourage engagement.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
1 Interoperability of Spatial Data Sets and Services Data quality and Metadata: what is needed, what is feasible, next steps Interoperability of Spatial.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
CLARIN work packages. Conference Place yyyy-mm-dd
ESSnet on microdata linking and data warehousing in statistical production: Metadata Quality in the Statistical Data Warehouse.
CLARIN-NL ISOcat workshop 2012 part 2 ( ) Ineke Schuurman Menzo Windhouwer.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
S&I Provider Directories Initiative Revisions to Initiative Charter July 1, 2011.
Data Provenance Community Meeting November 6, 2014.
Health eDecisions Use Case 2: CDS Guidance Service Strawman of Core Concepts Use Case 2 1.
Chapter 4 Developing and Sustaining a Knowledge Culture
Strategies for Managing the Online Workload CADE 2003 St. John’s Newfoundland June, 2003.
Towards a Glossary of Activities in the Ontology Engineering Field Mari Carmen Suárez-Figueroa and Asunción Gómez-Pérez {mcsuarez, Ontology.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
Consultant Advance Research Team. Outline UNDERSTANDING M&E DATA NEEDS PEOPLE, PARTNERSHIP AND PLANNING 1.Organizational structures with HIV M&E functions.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
ISOcat status
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Electronic Submission of Medical Documentation (esMD)
Doc.: IEEE /0147r0 Submission January 2012 Rolf de Vegt (Qualcomm)) Slide ai Spec Development Process Update Proposal Date:
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Kyung Hee University Class Diagramming Notation OOSD 담당조교 석사과정 이정환.
Identification of Classes. Object Oriented Analysis (OOA) OOA is process by which we identify classes that play role in achieving system goals & requirements.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
ISWG / SIF / GEOSS OOSSIW - November, 2008 GEOSS “Interoperability” Steven F. Browdy (ISWG, SIF, SCC)
Architecture Ecosystem SIG March 2010 Update Jacksonville FL.
Viewpoint Modeling and Model-Based Media Generation for Systems Engineers Automatic View and Document Generation for Scalable Model- Based Engineering.
Success in the Online Environment Lawrence C. Ragan, Ph.D., Penn State’s World Campus Mount St. Vincent University April 12th 2005.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
CCNT Lab of Zhejiang University
The Standards and Interoperability Forum
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Presentation transcript:

Trends in Concept Modelling Turning Issues into Solutions How to Discipline a Cat Sue Ellen Wright, Kent State University

Herding ISOcats Neeri Conference, Helsinki, content/uploads/2008/10/herding-cats.jpg

What is ISOcat? – 1 The implementation of the ISO TC 37 Data Category Registry, a Metadata Registry A knowledge resource containing, i.a., definitions for data category concepts used to annotate language resources A (potentially) authoritative concept database that can be used to anchor relations in external Relation Registries (RRs) and other knowledge resources Neeri Conference, Helsinki,

Semantic Issues for DCs What data element names occur in these data? What do the content of these DCs “mean” in a semantic sense? Can I utilize these data in my environment, especially across barriers of communities of practice? Neeri Conference, Helsinki,

Goals of ISOcat Supporting the reusability, integratability, and interoperability of data by defining data categories (data element concepts) as an instantiation of ISO Trusting data in a climate of different communities of practice Community collaboration in defining the data categories used in language resources Neeri Conference, Helsinki,

Metadata Registries in An ISO metadata registry consists of a hierarchy of "concepts" with associated properties for each concept. Concepts are similar to classes in object-oriented programming but without the behavioral elements. Properties are similar to Class attributes. ISO standards require that each concept and property have a precisely worded data element definition hierarchyclassesobject-oriented programmingdata element definition Neeri Conference, Helsinki,

What is ISOcat? – 2 A social network designed to facilitate the creation of data category specifications for use in linguistic annotation schemes A forum for achieving consensus on data category names, definitions, permissible instances and data category selections for work groups and thematic domains A framework for standardizing a subset of these data categories and data category selections (e.g., tagsets) Neeri Conference, Helsinki,

ISOcat as an MDR with a history “Authority” and credibility are hampered by: –Incorrect spellings, definitions, examples –Narrow perspectives that ignore individual language specifics –Failure to observe moderately uniform conventions –Failure in the past to accommodate consensus among experts Neeri Conference, Helsinki,

Knowledge Resource Issues Flawed legacy data from the Syntax pilot DCR –Lack of stable guidelines for attributes such as definitions & some names –Introduction of data refinements, which require updating virtually all DCs –Technical glitches that resulted in missing DCs –Inevitable errors, coupled with inability to edit entries –Lack of efficient group consensus mechanism Danger of similar issues in the future Neeri Conference, Helsinki,

Examples: Flawed SALT Input Critical standardized items not imported Recreated, but not marked as standard Neeri Conference, Helsinki, /part of speech/ is standardized as per ISO 12620:1999, but shows up as private & candidate

Uncorrected Data Errors Incorrect definition for some languages Neeri Conference, Helsinki, In languages that actually use the accusative case, it is frequently used for other purposes, particularly as an object of some prepositions, or to use a noun as an adverbial. (Alas, even Crystal can be wrong on occasion.)

Uncorrected Data Errors Discursive, rambling definitions; inappropriate example ___ ___ ___Neeri Conference, Helsinki, Again, reliance on one source & lack of feedback from speakers of other languages results in misinformation. A better example: She fixed him a nice lunch.

Language-Related Issues English, which can talk about the accusative case, but does not actually have an accusative case, conflates dative and accusative (at least) into objective case, but there is no entry for this. Neeri Conference, Helsinki,

Conflicting Definitions The object is to define appellative nouns used as components of proper nouns Neeri Conference, Helsinki,

Uncorrected Data Errors Tautological definition, confusing note, lack of a clarifying example 15 Definitions should not simply restate the elements of the data category name.

Uncorrected Data Errors Incorrect “language-independent” name (incorrect English name) Neeri Conference, Helsinki, The correct name is participial adjective..

Solutions Data must be reliable and adhere to consistent rules in order to contribute to trust on the Web. DCR Guidelines (unavailable during Syntax phase) are now in place. Neeri Conference, Helsinki,

Solutions Provide appropriate social networking environment & technical features Designed to avoid and correct similar discrepancies in the future Rationale: –Individual experts can make mistakes. –Group consensus verifies form and content. –Multiple mother tongues contribute to broader understanding. –Even non-standardized DCs benefit from consensus. Neeri Conference, Helsinki,

Roles in ISOcat – Individuals Guests –Access, select, and output data Experts –Above, plus save DCs, DCSs –Create new DCs –Share DCs –Create & serve on Ad Hoc Groups, TDGs –Coordinate Ad Hoc Group, chair TDGs –Submit DCs for standardization Neeri Conference, Helsinki,

Roles in ISOcat Ad hoc groups – informal groups of experts assigned by individual experts –Comment on and reach consensus on DCs in shared space –Submit (if desired) DCs for standardization Semi-formal ad hoc groups: –LISA/OSCAR group, others (DITA? DARWIN? XLIFF?) –CLARIN work groups –ISOcat work groups for other major projects Informal, truly ad hoc groups Neeri Conference, Helsinki,

Roles in ISOcat Thematic Domain Groups –Formal groups appointed by ISO TC 37 P members & TDG Chairs TDG Chairs –Manage DC evaluation process per ISO DCR Board members –Conduct DC validation & harmonization process Neeri Conference, Helsinki,

QA Scenario 1, Phase 1 Expert A spots perceived error in DC belonging to Expert B or to a TDG Expert A clones currently locked DC in his/her own workspace Here s/he can propose editorial corrections in the cloned DC Expert A invites Expert B to share clone Expert A informs TDG and shares clone Neeri Conference, Helsinki,

QA Scenario 2, Phase 1 TDG chair/Ad Hoc Group leader creates a DCS for review –Selection from current DCs or –Creation of new DCs –Profile change management DCS (with its DCs) assigned to a pre- defined group (ad hoc or TDG) Neeri Conference, Helsinki,

Profiles versus DCSs Profile membership is part of the DC specification –the profile indicates the thematic domain of the DC –the profile view in the UI is created by a query –there are a limited number of profiles A DCS is a collection of DCs –hand picked by an user for a specific purpose –can contain DCs from various profiles –there can be an unlimited number of DCSs There isn’t (yet) a profile specific view on a DCS 24 Neeri Conference, Helsinki,

QA Scenario 3, Phase 1 Any role (individual expert, group, DCRB member) identifies the need to harmonize DCs between TDGs or across communities of practice within a TDG DC or DCs collected, cloned Discussion group assigned & notified Multiple ad hoc or thematic domain groups may have to interact. Neeri Conference, Helsinki,

All QA Scenarios, Phase 2 Wiki and/or forum based discussion Informal consensus or formal balloting Correction according to above decision Harmonization issues –Option 1: DCs merged to form one, perhaps with multiple profile options –Option 2: Multiple DCs, with clear indication of differences that justify doublettes –Option 3: Build external RR linking doublettes Neeri Conference, Helsinki,

Justified Doublettes – Terminology The value domain for /part of speech/ in Terminology is very short. Neeri Conference, Helsinki,

Doublettes – Morphosyntax PoS The value domain for morphosyntax is extremely large.

Tasks Enable the retention of historical data category specifications while facilitating ongoing revisions Integration of a variety of social networking features –Wiki and forum add-ons –Internal messaging system –Link to external mail Creation, consensus, revision, harmonization Neeri Conference, Helsinki,

Thanks for your attention! Come play with the cat! 30Neeri Conference, Helsinki,