Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Slides:



Advertisements
Similar presentations
The development of Cascot: Computer Aided Structured Coding Tool
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Meeting of the United Nations Expert Group on International Economic and Social Classifications (UNEG) (New York, 20 – 24 June 2005) Some points drawn.
WP 3: Survey Quality Eric Harrison City University London Knut Kalgraff Skjåk Norwegian Social Science Data Services IASSIST May May, Cologne.
Using KE in Multilingual Mode Robert Patterson Michele Watson.
New solutions for transnational access to secure use files David Schiller (IAB) Richard Welpton (UKDA) Microdata Access in European Countries – Cooperation.
External Trade Statistical System UNECA – Addis Ababa, October 2011.
Tailoring Needs Chapter 3. Contents This presentation covers the following: – Design considerations for tailored data-entry screens – Design considerations.
DEVELOPMENT OF CASCOT 5.0 (a multi-language text coding tool) Presentation to the DASISH project meeting, Gothenburg, November 2014 Peter Elias Margaret.
CASCOT AND THE CODING OF OCCUPATIONS IN EUROPEAN SURVEYS Presentation for the INGRID Workshop Amsterdam, February 2014 Margaret Birch Institute for.
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
Data Service Infrastructure for the Social Science and the Humanities (DASISH): Improving Survey Quality in Cross-national Research Eric Harrison City.
CASCOT International version 5 User Guide Peter Elias, Margaret Birch and Ritva Ellison Institute for Employment Research University of Warwick December.
Multilinguality of Polish Cultural Institutions’ Websites Piotr Ryszewski The International Centre for Information Management Systems and Services ICIMSS.
An innovative platform to allow translation and indexing of internet sites Localization World
What is so good about Archie and RevMan 5
DireXions – Library Language Translation Library Language Translation Presented by: Brian Thompson.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Bratislava - 02/10/ Romain FERRETTI Overview of the VISAL project.
IBM Maximo Asset Management © 2007 IBM Corporation Tivoli Technical Exchange Calls Aug 31, Maximo - Multi-Language Capabilities Ritsuko Beuchert.
EGM presentation prepared by ILO Updating ISCO Process.
DEVELOPMENT OF CASCOT 5.0 (the multi-language version) Presentation for the Venice Workshop April 2014 Margaret Birch Institute for Employment Research.
Validating ESeC using Round 1 of the European Social Survey ESeC Validation Conference, Lisbon, January 2006 Eric Harrison & David Rose ISER, University.
Federal Department of Home Affairs FDHA Swiss Federal Office of Culture FOC Swiss National Library SNL Multilingual Access to Subjects (MACS) Patrice Landry.
The International Standard ISO 2384 Presentation of Translations Part 1.
Coding of parental occupations ICCS Marker Training Hamburg, July 2007.
IATE EU tool for translation-oriented terminology work
Class Schemas and Employment Relations Comparisons between the ESeC and the EGP class schemas using European data By Erik Bihagen, Magnus Nermo, & Robert.
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
Writing your dissertation. Overview Dissertation structure and components Writing Software assistance A look at past dissertations.
Research Data Centre network for transnational access - four years of experiences by seven European RDCs Karen Dennison (UK Data Archive) and David Schiller.
ISCO-08 - Current Status and plans to support implementation David Hunter Department of Statistics International Labour Office United Nations Expert Group.
CASCOT for EurOccupations Demonstration of the software English, Dutch, French Manual coding Linking to EurOccupations database Automated coding Specific.
CASCOT AND THE CODING OF OCCUPATIONS IN EUROPEAN SURVEYS Demonstration of CASCOT Presentation for the InGRID Workshop Amsterdam, February 2014 Ritva.
Company Confidential 1 This presentation is solely for the use of Patni personnel. No part of it may be circulated, quoted, or reproduced for distribution.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
1. 2 Content The Romanische Bibliographie Online is the only comprehensive specialist bibliography for Romance language and literature studies –available.
Existential Graphs Software Dr. Russell Herman Department of Mathematics and Statistics University of North Carolina at Wilmington August 2003.
Who gets a degree? Access to tertiary education in Europe Jan Koucký, Aleš Bartušek Malátova 17, Prague 5, Czech Republic tel.:
Harmonisation across countries in SHARE Workshop on Harmonisation of Social Survey Data for Cross-National Comparison Prague 19.
EurOccupations Developing a detailed 7-country occupations database for comparative socio-economic research in the European Union Project period: May 2006-May.
OCLC Research: Selected projects Eric Childress Larry Olszewski Presentation for Dpto. Biblioteconomía y Documentación Universidad Carlos III de Madrid.
Digital curation activities enhance access and retrieval, maintain quality, add value, and facilitate use and re-use over time. This poster demonstrates.
Countries and nationalities (Europe) Markéta Zakouřilová VY_32_INOVACE_104 ZŠ Jenišovice.
An European index of occupations. Name of the presentation Job titles Job title is the usual designation given to a person doing a specific job. There.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
CASCOT and its coding rules Presentation for DASISH Workshop Venice, April 2014 Ritva Ellison Institute for Employment Research.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 ESSnet Projects “Decentralised Access to EU microdata” Maurice Brandt Research.
2-1 A Federation of Information Systems. 2-2 Information System Applications.
Data Management Seminar, 9-12th July 2007, Hamburg WinW3S - Introduction.
Data Management Seminar, 8-11th July 2008, Hamburg WinW3S - Introduction.
ICCS Marker Training Hamburg July 2008 Final note on marking Reliability marking report from WinDEM will include record of scores for double-marked items,
Training on occupational classifications. Name of the presentation Introduction ISCO 08 has started to be implemented in the EU countries in several social.
13-Jul-07 State of the art of the ISCO-08 implementation.
Summary points arising from the first three regional workshops Oslo, 7 th June 2005 Lisbon, 15 th September 2005 Piraeus, 23 rd September 2005 Peter Elias.
State of play and plans by variable Occupation. 2 Policy needs for comparable data on occupations  Indicators on gender segregation used in the follow.
CASCOT Editor Ritva Ellison Institute for Employment Research University of Warwick.
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
10 Years Project Bernstein
Online Educational tool #2 and #3
Key findings on comparability of language testing in Europe ECML Colloquium 7th December 2016 Dr Nick Saville.
Statistics Explained goes multilingual
Coding occupations The new coding process Sue Westerman, Marc Houben.
ENCODING TOOL DEVELOPED BY HUNGARY Márta Záhonyi
Multi-National Invoices
Statistics Explained goes multilingual
Grants for the implementation of ISCO 08 during 2010
Overview of Computer system
iSecurity Password-Reset Training
Presentation transcript:

Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research

Computer Assisted Structured Coding Tool CASCOT Software tool for coding text automatically or manually Developed at the Institute for Employment Research at Warwick University Used by over 100 organisations in the UK and abroad

IER contracted under the DASISH project to develop a multilingual version of CASCOT to code job titles to ISCO 08 A large task and limited resources, so this is a pilot project The 8 selected languages: - Dutch (Netherlands, Flemish-Belgium) - English - Finnish - French (France, Walloon-Belgium, Switzerland) - German (Germany, Austria, Switzerland) - Italian - Slovak - Spanish

Key Tasks Translating Cascot user interface texts Constructing national language versions of the ISCO 08 structure for Cascot Indexing job titles in the selected languages to ISCO 08 - Some supplied by NSIs or other partners - Some found by exploring relevant national websites Validating the software using raw data files from the European Social Survey (ESS) Round 6 Testing Cascot multilingual software Developing language-based coding rules Using Cascot Performance Tool to fine-tune the software

Coding with Cascot Enter text (could be from a file) Cascot provides a recommendation for code but user can change it Output can be directed to a file Selected classification

Multi-language Cascot 8 languages available: Dutch, English, Finnish, French, German, Italian, Slovak and Spanish Cascot detects language automatically but it can be changed from menu ISCO-08 classification exists for each country (some with national code)

Coding in Dutch

Finnish

French

German * * The index is © Federal Employment Agency

Italian

Slovak

Spanish

A test of multi-language Cascot Comparison of European Social Survey round 6 code and automatic Cascot code Data available from DE, ES, GB and NL ISCO-08

Cascot Performance Tool Allows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data. A delimited results file is needed that contains a reference code, Cascot code and Cascot score. The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and Key

Opening a results file

Performance Results Display The longer the green line stays high, the better The more towards right the purple/blue lines are, the better

The versions in different languages could be improved by developing coding rules Contribution needed from experts who know the language Rules are developed with Cascot Editor Fine-tuning multi-language Cascot

Cascot Editor Classification files for Cascot are created and modified with the Editor Each classification has Structure, Index, Rules for coding

Cascot Editor Rules Downgraded words: words that are considered to be significantly less important than other words, e.g. deputy, junior, person Equivalent word ends: wait|er, wait|ress Abbreviations: asst  assistant, fe  further education Replacement words: taylor  tailor, tesco  supermarket –Omitting noise words, e.g. replace ‘part-time’ with nothing Input modifications: used when the rule absolutely can not be made elsewhere Word alternatives: words and phrases that should also be tried as possible solution candidates Conclusions, retired  can not conclude, agent  ambiguous (score 39) Default coding: a set of words and phrases that should be scored as though they were a different word or phrase

Example of a new rule - English Add two new Replacement Words rules: The result: The problem:

Potential for rules - German German occupational titles were coded fully automatically with Cascot and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance. It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes. Cascot coding result can be compared with “gold standard” to find areas for improvement.