UK DATA ARCHIVE-NLP COLLABORATION Louise Corti and Claire Grover UK Data Archive University of Essex Colchester, Essex CO4 3SQ

Slides:



Advertisements
Similar presentations
THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
Advertisements

ESDS Qualidata: Qualitative Data Preparation and Use John Southall ESDS 26 November 2003.
New Services for Users Enhanced User Support and Enhanced Access to Data Angela Dale, Head ESDS Government Melanie Wright, Head ESDS Access & Preservation.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up (SQUAD) Louise Corti UK Data Archive, University of Essex QUADS Demonstrator Workshop.
Using Atlas-ti to explore qualitative data Libby Bishop and Louise Corti, UK Data Archive, ESDS, University of Essex IASSIST 2004 workshop.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
Issues in methods and reuse for hypermedia ethnography Presented at QUADS Showcase day September 28, 2006 Louise Corti.
ESDS Qualidata and QUADS Coordination Louise Corti Online Resources Day 15 November 2005, London.
QUALITATIVE ARCHIVING AND DATA SHARING SCHEME WHO WE ARE QUADS is the ESRC Qualitative Archiving and Data Sharing Scheme, running from April 2005 until.
ESDS Qualidata. Qualitative Data Collections Data from National Research Council (ESRC) individual research grant awards Data from ESRC Programme research.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
QUALITATIVE ARCHIVE OF THE NORTHERN IRELAND CONFLICT The conflict in Northern Ireland over the last 35 years has generated.
HAND OUTS DExT Project UK Data Archive September 2007.
A DTD for Qualitative Data: Extending the DDI to Mark-up the Content of Non-numeric Data Libby Bishop and Louise Corti, UK Data Archive, ESDS, University.
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
New features for ESDS Qualidata Online Libby Bishop UK Data Archive, University of Essex QUADS Demonstrator Workshop 28 September 2006.
Nesstar, ESDS International and ESDS Qualidata online demonstrations ASLIB visit to the UK Data Archive Wednesday 24 November 2004 Louise Corti, Associate.
Data Exchange and Conversion Utilities and Tools (DExT) Louise Corti, Angad Bhat, Herve LHours UK Data Archive CAQDAS Conference, April 2007.
QUADS Co-ordination Louise Corti QUADS Director, UKDA 28 September 2006.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Qualitative Data Preparation and Use Jack Kneeshaw ESDS Psychology Department-U of Essex 4 December 2003.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Louise Corti IASSIST, Edinburgh May 2005.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up (SQUAD) Louise Corti and Libby Bishop UK Data Archive, University of Essex IASSIST.
Learning while sharing experience in the BOLDIC network: methodological principles and practical implementation Audronė Valiuškevičiūtė (Vytautas Magnus.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
Lessons learned within international collaboration in the area of digital preservation of cultural heritage Gábor KAPOSI – MTA SZTAKI Tibor SZKALICZKI.
QUALITATIVE ARCHIVE OF THE NORTHERN IRELAND CONFLICT The conflict in Northern Ireland over the last 35 years has generated.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
© Tanner, KCL 2007 How do I decide if JPEG 2000 is for me? Choosing standards when there are so many… Simon Tanner Director.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
DExT PROJECT Louise Corti UK Data Archive University of Essex Colchester, Essex CO4 3SQ Tel: +44 (0) URL:
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Organizing Internet Resources OCLC’s Internet Cataloging Project -- funded by the Department of Education -- from October 1, 1994 to March 31, 1996.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up (SQUAD) Louise Corti UK Data Archive, University of Essex ASC Conference 29 September.
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Language and Computation Day University of Essex 4 October 2005.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
REPORT BACK FROM THE DDI QUALITATIVE WORKING GROUP ……………………………………………………….………………………………
Quads.esds.ac.uk/squad THE PROJECT SMART QUALITATIVE DATA: METHODS AND COMMUNITY TOOLS FOR DATA MARK-UP SQUAD aims to explore methodological and technical.
Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
WP 3: Standardisation of shared metadata Mode of operation –All partners are involved –Building on practice outside the project Achievements of Year 1.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
REPRESENTING CONTEXT IN AN ARCHIVE OF EDUCATIONAL EVALUATIONS PROJECT ACTIVITIES The project team canvassed opinion across the.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
REPRESENTING CONTEXT IN AN ARCHIVE OF EDUCATIONAL EVALUATIONS The project has constructed a permanent archive of significant.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
METHODOLOGICAL ISSUES IN QUALITATIVE DATA SHARING AND ARCHIVING THE PROJECT TEAM CONTACT Dr Bella Dicks Cardiff School.
METADATA ORGANISATION ESDS APPROACHES AND RESOURCES …………………………………………
1 Annotation Framework March Terminology CV - abbreviation for controlled vocabulary CRS - Community Review System (a collection within DLESE)
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Markup of Educational Content
Powerful access to qualitative data: What’s behind the UK QualiBank
Presentation transcript:

UK DATA ARCHIVE-NLP COLLABORATION Louise Corti and Claire Grover UK Data Archive University of Essex Colchester, Essex CO4 3SQ Tel: +44 (0) URL: quads.esds.ac.uk/squad CONTACT USING NLP TOOLS WHAT FEATURES DO WE NEED TO MARK-UP AND WHY? WHAT IS SQUAD? METADATA STANDARDS quads.esds.ac.uk/squad Spoken interview texts provide the clearest and most common example of the types of encoding features needed. There are three basic groups of structural features: CAPTURING AND DEFINING DATA CONTEXT enables preservation and re-use of metadata, data and annotation ensures consistency of presentation and description of data supports the development of common web-based publishing and search tools facilitates data interchange (e.g. CADAS packages) and comparison among datasets Progress: limited formal definition of a common XML vocabulary and Document Type Definition (DTD) based on the Text Encoding Initiative (TEI) testing of a new Qualitative Data Interchange Format (QDIF) DATA EXCHANGE STANDARDS Main aim: to explore methodological and technical solutions for ‘exposing’ digital qualitative data to make them fully shareable and exploitable. Main objectives. ANNOTATION TOOL - ANONYMISE Rich context enables informed re-use of data. But defining how to provide context for raw data to make it more ‘usable’ is complex. ESDS Qualidata has spent ten years working in the area of sharing qualitative data, and has done much to establish informal ways of documenting raw data. Both micro and macro level features should be considered including: how the research question was framed, the research application process, project progress, fieldwork situations, analyses processes. Fieldwork observations are useful as are timelines and political chronologies. Equally when undertaking a replication or restudy, detailed information on sampling procedures, field work approaches and question guides will be essential. Archiving and exposure of qualitative data in a way that faithfully represents its origins and context is important. Linking qualitative data to other distributed data sources such as audio-visual or geo-coded data sources, such as maps can afford creative and exciting ways of visualising data. Identify atomic elements of information in text: Example: Italy's business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice- president of Music Masters of Milan, Inc to become operations director of Arthur Anderson. Information Extraction (IE) is a sub-field of NLP which aims to identify key pieces of information in texts using 'shallow' analysis techniques. A typical IE system will perform Named Entity Recognition where particular kinds of proper names and terms are identified, classified and marked up. ESDS Qualidata is using semi-automated mark-up of some components of its data collections using natural language processing (NLP) and information extraction: AUDIOVISUAL ARCHIVING This tool imports marked up data from the CME NLP system. Named entities are highlighted and co-reference chains – e.g numerous references to a single person - are identified. core tag set for transcription names, numbers, dates links and cross references notes and annotations text structure unique to spoken texts linking, segmentation and alignment advanced pointing - XPointer framework text and AV synchronisation contextual information (participants, setting, text) Names can be anonymised with chosen pseudonyms. The references of names to pseudonyms is saved. Annotations are explored in an XML format in the NITE NXT model. NXT uses ‘stand off’ annotation – where annotation is linked to or referenced by words. This is a means of annotating documents with semantic metadata – enabling highly resource discovery and data exploration. The Java interface tool developed in SQUAD is called CME. SQUAD has identified a minimal generic set of elements that represent a baseline for contextualising data. QUADS has produced an edited collection on this issue as a special edition of the Journal in Methodological Innovations Online. sirius.soc.plymouth.ac.uk/~andyp/. UK Data Archive, University of Essex (lead partner) Language Technology Group, Human Communication Research Centre, School of Informatics, University of Edinburgh SMART QUALITATIVE DATA: METHODS AND COMMUNITY TOOLS FOR DATA MARK-UP new partnerships created – new methods, tools and jargon to learn new area of application for NLP to social science data growing interest in UK in applying NLP and text mining to social science texts – data and research outputs such as publications’ abstracts Collaboration between: 18 months duration 1 March 2005 – 31 October 2006 The XML schema will specify a ‘reduced’ set of Text Encoding Initiative (TEI) elements: specify, test and propose an eXtended Markup Language (XML) schema for storing and marking up qualitative data investigate requirements for contextualising qualitative data and developing standards for data documentation develop semi-automated using natural language processing tools for preparing marked up qualitative data for sharing research tools for publishing and interrogating data via the web – Qualitative Data Mark-Up Tools (QDMT) utterance, specific turn taker, defining idiosyncrasies in transcription links to analytic annotation and other data types (e.g. thematic codes, concepts,audio or video links, researcher annotations) identifying information such as real names, company names, place names, occupations, temporal information personal names company/organisation names locations dates times percentages occupations monetary amounts The formalised and systematic archiving and sharing of digital audio-visual data from qualitative research is fairly new. SQUAD is helping to explore XML representation and display of audio-visual data. A uniform format for richly encoding qualitative research is necessary as it: defined header metadata for a standardised transcript defined and tested generic XML models for qualitative data tested and refined NLP tools for qualitative data built front end to NLP named entity tools chosen software to enable annotation of data explored data export formats for longer-term archiving investigated powerful XML based indexing tools for searching and retrieving data investigated web display of multimedia data and pointers to other resources using XML - extending the functionality of ESDS Qualidata From Autumn 2006: formalising data exchange standard key word extraction systems to help conceptually index qualitative data – text mining collaboration exploring grid-enabling data: e-social science collaboration TOOLS PROGRESS There's just one or two factual things first of all do you mind my asking how old you are? 49. And what schools did you go to? - King Street, Woodside and Hilton. Uh-huh.. and how old were you when you left the school? 14. And you work at the moment? What sort of work do you do? - Well I've gone back to get shorter hours, I've went back to domestic, which I dinna really care for. But then I used to be in the pharmacy department at ARI... just pharmacy assistant Information about interviewee Date of birth: 1930 Gender: female Marital status: married Occupation: pharmacy assistant Geographic region: Scotland LP:There's just one or two factual things first of all do you mind my asking how old you are? G24:49. LP:And what schools did you go to? G24:King Street, Woodside and Hilton. LP:Uh-huh.. and how old were you when you left the school? G24:14. LP:And you work at the moment? What sort of work do you do? G24:Well I've gone back to get shorter hours, I've went back to domestic, which I dinna really care for. But then I used to be in the pharmacy department at ARI... just pharmacy assistant. At least it was better than cleanin'! But then they've nae part-time workers there so.. LP:And did you work in the pharmacy long? XML: enabling web- enabled display, search and browse XML: enabling a standardised format for interview transcripts interview text with XML tags embedded