Download presentation
Presentation is loading. Please wait.
Published byJuliet Lindsey Modified over 9 years ago
1
Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research
2
Computer Assisted Structured Coding Tool CASCOT Software tool for coding text automatically or manually Developed at the Institute for Employment Research at Warwick University 1993- Used by over 100 organisations in the UK and abroad
3
IER contracted under the DASISH project to develop a multilingual version of CASCOT to code job titles to ISCO 08 A large task and limited resources, so this is a pilot project The 8 selected languages: - Dutch (Netherlands, Flemish-Belgium) - English - Finnish - French (France, Walloon-Belgium, Switzerland) - German (Germany, Austria, Switzerland) - Italian - Slovak - Spanish
4
Key Tasks Translating Cascot user interface texts Constructing national language versions of the ISCO 08 structure for Cascot Indexing job titles in the selected languages to ISCO 08 - Some supplied by NSIs or other partners - Some found by exploring relevant national websites Validating the software using raw data files from the European Social Survey (ESS) Round 6 Testing Cascot multilingual software Developing language-based coding rules Using Cascot Performance Tool to fine-tune the software
5
Coding with Cascot Enter text (could be from a file) Cascot provides a recommendation for code but user can change it Output can be directed to a file Selected classification
6
Multi-language Cascot 8 languages available: Dutch, English, Finnish, French, German, Italian, Slovak and Spanish Cascot detects language automatically but it can be changed from menu ISCO-08 classification exists for each country (some with national code)
7
Coding in Dutch
8
Finnish
9
French
10
German * * The index is © Federal Employment Agency
11
Italian
12
Slovak
13
Spanish
14
A test of multi-language Cascot Comparison of European Social Survey round 6 code and automatic Cascot code Data available from DE, ES, GB and NL ISCO-08
15
Cascot Performance Tool Allows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data. A delimited results file is needed that contains a reference code, Cascot code and Cascot score. The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and Key
16
Opening a results file
17
Performance Results Display The longer the green line stays high, the better The more towards right the purple/blue lines are, the better
18
The versions in different languages could be improved by developing coding rules Contribution needed from experts who know the language Rules are developed with Cascot Editor Fine-tuning multi-language Cascot
19
Cascot Editor Classification files for Cascot are created and modified with the Editor Each classification has Structure, Index, Rules for coding
20
Cascot Editor Rules Downgraded words: words that are considered to be significantly less important than other words, e.g. deputy, junior, person Equivalent word ends: wait|er, wait|ress Abbreviations: asst assistant, fe further education Replacement words: taylor tailor, tesco supermarket –Omitting noise words, e.g. replace ‘part-time’ with nothing Input modifications: used when the rule absolutely can not be made elsewhere Word alternatives: words and phrases that should also be tried as possible solution candidates Conclusions, retired can not conclude, agent ambiguous (score 39) Default coding: a set of words and phrases that should be scored as though they were a different word or phrase
21
Example of a new rule - English Add two new Replacement Words rules: The result: The problem:
22
Potential for rules - German German occupational titles were coded fully automatically with Cascot and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance. It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes. Cascot coding result can be compared with “gold standard” to find areas for improvement.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.