Capitalize on your data Best Practices for the future Open Issues on how to contribute data To share with you what we learnt from the training workshops.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Online Submission and Management Information -- Authors
HP Asset Hub Support through Service Central
Submitting Book Chapters via Manuscript Central A Short Guide for Wiley-VCH Authors.
Centre d’échange d’informations sur la Convention sur la Diversité Biologique Welcome to the manual on adding hyperlinks in a PTK website Convention on.
A Guide to the BIZNET Online Filing System STATE OF CONNECTICUT DEPARTMENT OF CHILDREN & FAMILIES (DCF) DEPARTMENT OF DEVELOPMENTAL SERVICES (DDS) DEPARTMENT.
Career Services Center Employer Training. This is the main login page. The link can be found at Employers.
ASSE’s Council on Practices & Standards 2012 ASSE Chapter Leadership.
Acronis Sales Escalation Process 1. Overview – How will this benefit you? 2 Acronis Customer Central is here to help sales close deals and retain customers.
Lead Management Tool Partner User Guide March 15, 2013
CERN Methodology document 1 Use of CERN CAD Exchange tool Process and rules for exchange of CAD files by using CERN CAD Exchange tool. Prepared by : B.
Journalism & Media Studies Graduate Student Culminating Work : Steps for Submitting to the Campus Digital Archive at USFSP November 21, 2011 by Carol Hixson.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
Submitting Course Outlines for C-ID Designation Training for Articulation Officers Summer 2012.
Tutorial Contents Text Cut/Paste & Format OhioLINK ETD Home UT ETD Home ETD Entry Form UT Grad School University Libraries ETD Guide OhioLINK Electronic.
CSUN eCommons Submitting Learning Objects to CSUN eCommons: A Preliminary Guide February 7, 2008.
LG DATABASE AND REPORTING SYSTEM (LGDRS) 8-9 September 2015
Getting Started. Package Overview (GradeQuick)‏ Web-based grade book –Access Anywhere –Always Current Paper grade book “look and feel” Flexible grading.
DalSpace A content repository for Dalhousie community members.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Extending ICT Research Co-operation between the European Union, Eastern Europe and the Southern Caucasus EECA FP7 Help-Desk EXTEND Training Workshop Baku,
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Filling institutional repositories: considering copyright issues Susan Veldsman eIFL Content Manager
IllinoisJobLink.com Training Video Creating a Resume Copyright © 2015, America’s Job Link Alliance–Technical Support (AJLA–TS) All rights reserved. This.
Session 5 – online submission of applications Bruno Breviglieri, Natasha Jovicic.
Enterprise Oracle Solutions Oracle Report Manager The New ADI and More Revised:June 20091Report Manager/SROAUG Presentation.
Document Module Features Streamlines the control, routing and revision process for critical documents and records Controls documents in any format (Excel,
How to complete and submit a Final Report through Mobility Tool+ Technical guidelines Authentication, Completion and Submission 1 Antonia Gogaki IT Officer.
ICAD3218A Create User Documentation.  Before starting to create any user documentation ask ‘What is the documentation going to be used for?’.  When.
NPA eMS application – Project Information Joint Secretariat 1st June 2016 – Cork, Ireland.
WORKSHOP GUIDELINES It is our pleasure to welcome you as a speaker at the 21st Cochrane Colloquium! We kindly ask you to follow the instructions below.
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
How to complete and submit a Final Report through
AdisInsight User Guide July 2015
Online Submission and Management Information -- Authors
To the Switching Statistics Overview Online Training Course
KARES Demonstration.
To the Switching Statistics Overview Online Training Course
REACH 2018 Find your co-registrants and prepare to register jointly.
Supplier Portal Self-Registration
Tutorial for 21Classes.com
ELECTRONIC PROPOSAL SUBMISSION SYSTEM
An Overview of Data-PASS Shared Catalog
IPOM and E-Booking.
NPA eMS application – Project Information
Installation & User Guide
Adding Assignments and Learning Units to Your TSS Course
Central Document Library Quick Reference User Guide View User Guide
How to invoice HSBC Hong Kong – Guide for suppliers
How to LINK Policies.
Part 2 Setting up a web server the easy way
ICOTS Helpdesk Training
Linda MacDonald, Hay Group
Optimizing Efficiency + Funding
NPA eMS application – Project Information
Management Information System - MIS
Delivering electronic Natura 2000 data via Reportnet CDR
Delivering electronic data via Reportnet Central Data Repository
Delivering electronic Natura 2000 data via Reportnet CDR
Managing a Web Server and Files
CLIENT RELATIONSHIP MANAGEMENT KEEPING TRACK OF REQUESTS THE EASY WAY
Information session SCIENTIFIC & TECHNICAL NEGOTIATIONS Call FP7-ENV-2013-WATER-INNO-DEMO "Environment (including climate change)" Brussels 24/06/2013.
The Grants.gov Online Grant Submission Portal November 8, 2017
Manual for Supplier Registration
BIO1130 Lab 2 Scientific literature
Assignment Preliminaries
[insert Module title here]
R082 Creating digital graphics
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
Presentation transcript:

Capitalize on your data Best Practices for the future Open Issues on how to contribute data To share with you what we learnt from the training workshops so far (and a little bit) from our experience as data center Khalid Choukri (ELDA)

Overview We have seen the importance of data for Automated Translations Data Driven Paradigm Data is needed in all language(s) Where can we discover Data: Public Sector Players Visible data e.g. Web (HTML pages, reports, etc.) Invisible Data: archives , hidden web, internal repositories Through Language Service Providers What can be done for the future to capitalize on the data assets Our experience with Data Management Plans Sustainable sharing

Illustration of data packaging workflow Technical issues to start with Identification of sources, identification and selection of data sets (raw data) Privacy and ethics management from the start (e.g. Anonymization when required, accept/Reject data) Documentation with basic identification elements (Languages, Domains, year, …) Choice of Medium and Data formats for the transfer of the “raw” data (preference for the ELRC ad hoc platform) Existing Data LRs (Language Resources) Value chain activity Identification & Selection of Data Basic docu-mentation Cleaning & Conversion (content, container) Validation Processing of LRs (e.g. Alignment) Description & Storage of LRs Legal Status determination PSI vs Licensing Privacy handling and acceptance (i.e. anonymization) Upload data to a sustainable Repository & boost Sharing Market knowledge Industry network Public Sector Body (PSB) PSB/ELRC / EC Partnership ELRC

If new data some best practices for data management Analyse all phases of data development Based on 1), create a data management plan Legal, data workflow, formats, publication as PSI, … Relations with subcontractors and other partners Consider data sustainability Data specification, production, validation, sharing & distribution, maintenance & preservation Use the Web as an additional publication channel (see how ELRC can help) Get all the data from LSP (incl. Translation memories, source and target sources) Store on the national open portal (at least) , share Make sure these fit under PSI If you want to go beyond, adopt a DMP

Illustration of data packaging workflow Technical issues to start with Identification of sources, identification and selection of data sets (raw data) Privacy and ethics management from the start (e.g. Anonymization when required, accept/Reject data) Documentation with basic identification elements (Languages, Domains, year, …) Choice of Medium and Data formats for the transfer of the “raw” data (preference for the ELRC ad hoc platform) New Data LRs (Language Resources) Value chain activity Requirements and Needs Data Specification Production phase Validation Processing of LRs (e.g. Alignment) Description & Documentation Plans for future Legal Status Restricted vs PSI vs Licensing Plans for embargo period before releasing Upload data to the Repository & Sharing Market knowledge Industry network Partnership ELRC Public Partner ELRC / EC

Illustration of data packaging workflow Technical issues to start with Identification of sources, identification and selection of data sets (raw data) Privacy and ethics management from the start (e.g. Anonymization when required, accept/Reject data) Documentation with basic identification elements (Languages, Domains, year, …) Choice of Medium and Data formats for the transfer of the “raw” data (preference for the ELRC ad hoc platform) New Data LRs (Language Resources) Value chain activity Requirements and Needs Data Specification Production phase Validation Processing of LRs (e.g. Alignment) Description & Documentation Plans for future Legal Status Restricted vs PSI vs Licensing Plans for embargo period before releasing Upload data to the Repository & Sharing Sustainable storage Market knowledge Industry network This can be part of the data management plan (DMP) Partnership ELRC Public Partner ELRC / EC

Concerns in creating a DMP Anticipate all potential legal issues Ensure that your data IPRs are cleared Ensure that the producing parties adhere to your right “ownership” (e.g. relations with LSP: ensure you keep all rights) Ensure that all produced intermediary documents are yours (e.g. Translation Memories) Check the privacy issues in advance and plan for anonymization if necessary Define your management plan with respect to the task This has to account for the main goal (e.g. document writing, doc translation, etc.) Plan for repurposing (from documentation to LRs) Request data in a usable format (not only PDFs but also TMX/Word/XML/TXT) Make sure that your data uses up-to-date medium (no CDs?) Foresee for future publication and sharing as Public Sector Information (PSI)

Key elements of a Data Management Plan

Key elements of a Data Management Plan Specifications Ensure that the original documents are described Ensure that your needs are described Anticipate what you can get as valuable resources (a side effect) Production Whether internal or outsourced, check that the tools used are compatible with your needs and beyond (e.g. CAT, MT, etc.) Ask for the list of tools and production software Check if you can get texts in the multiple languages aligned to each other Keep a clear documentation of the data being produced (meta-data) SLIDES ONLY FOR QUESTIONS

Key elements of a Data Management Plan Validation In addition to your quality control, you may want to use some of the validation tools (lexical coherence, syntactic analysis, etc.) Sharing/distribution Ensure your data falls within the PSI directive as transposed in your country If not, foresee an open and permissive licence If privacy is an issue, plan necessary procedures to handle these Maintenance/preservation The best option is often to partnership with a data centre See how ELRC can assist you There is also the “option” of national open data portal Only “putting” data on the web is not a sufficient option (referencing?) The « option » term here is certainly not the most appropriate because in some countries and contexts this is mandatory

Cooperation actions Identification of sources, identification and selection of data sets (raw data) Data can be obtained from the visible sources (e.g. harvested from web) Data can be handed over by the public sector players Public sector players can boost the identification of visible sources Processing indicated above can be carried out in cooperation by the ELRC and the data provider

Procedural Issues (Data requests vs. open by default e.g. PSI) legal elements Procedural Issues (Data requests vs. open by default e.g. PSI) Licensing ELRC can help with the procedures Model licensing agreements Government Open Licenses Standard Re-use Licenses License interoperability The legal path is an easy one as you have seen from the legal session

We need your involvement You know your data visible vs. invisible Access to archives, deep web, etc. often is not possible to outsiders Not all data is already under PSI or a permissive license Access to derived forms (e.g., PDF) is less efficient than access to internal source content repositories.

Conclusions Repurposing existing data (human translations) is the best way to improve Automated Translation quality Data-driven paradigms provide an efficient way to leverage value from existing resources ELRC can help reviewing data for suitability (at any phase) Do not underestimate the value of your language resources, foresee a Data Management Plan

http://cef-at-sources.elda.org/add_source/ Helpdesk and Support

ELRC Portal www.lr-coordination.eu Screen shot goes here

ELRC Portal: Helpdesk Screen shot goes here

ELRC Portal: Web Forum Screen shot goes here Handles logistics and assists participants Provides for Technical issues Legal issues Other Clears legal and related issues Identifies and clears all technical issues Provides secretariat and legal team

ELRC Portal www.lr-coordination.eu Screen shot goes here

The resources part

How to Contribute Language Resources (1/7) Go to the ELRC Repository (through the RESOURCE Link) Click the Register button Click the Register button to get an account

How to Contribute Data (2/7) Fill in the info Read the Terms of Service and click Accept if you agree Click the Create Account button

How to Contribute Data (3/7) Your request is acknowledged and an activation email is sent to the address you indicated Check your email and click the activation link

How to Contribute Data (5/7) Fill in the details of the dataset

How to Contribute Data (6/7) Browse your computer for the respective .zip file containing your data Click Submit

How to Suggest SOURCES of data Submit Sources URL: http:/……………………………………… Submit URL Automatic check if URL already in database. If not, proceed with submission. Source submission form Source name: Languages: Provider name: …. Contact name: Email: Submit Source