Powerful access to qualitative data: What’s behind the UK QualiBank

Powerful access to qualitative data: What’s behind the UK QualiBank
Darren Bell Data & Services Developer UK Data Archive IASSIST, Toronto June 2014

QualiBank project: rationale & aims
Provide enhanced access to key qualitative data via online data browsing and exploration: UK QualiBank Based on existing metadata schemas and known technologies Offer a mechanism for reliably citing data located in the system Project includes large-scale digitisation of precious and undigitized materials Maximise the impact from existing research and resource investments – demonstrate re-use

UK Data Service and its own needs
We have one of the largest qualitative data collections– over 350 data collections A proportion of these have been digitised from older paper sources Currently users find and download these from our website Not so easy to find, but study documentation good No searching within collections No file manifest shown until download It can be a bit of guess work! Have Datacite DOIs; cannot reliably cite parts of data

Finding & accessing qualitative data
Search for “health” in our data catalogue, Discover Retrieve catalogue record, e.g. SN 6124: Being a Doctor: a Sociological Analysis, DDI 2.5 very limited for describing file content View limited user guide Web download as RTF bundle (46 transcripts)

Data listing

Download Zip of data and doc

Complex data collections
SN 5801: Concepts of Healthy Eating Food Research: Phases I and II, 293 interview transcripts; 73 diaries; 6 observation field notes Not represented well at all in a DDI 2.X catalogue

Metadata demands for UK QualiBank
Explore data through a data journey Find relevant extract, examine in context, cite Link data to still and moving images, and other related research outputs Some collections completely open Demands highly structured and consistently marked-up data Qualitative data requires object (file-level) descriptive metadata, e.g. interviews, audio-visual files, images Use of common metadata elements enable federated catalogues across providers and borders

Description below the collection
DDI 2.5 for catalogue metadata QuDEx schema for file level description: allows detailed identification of data objects: Interview transcript or audio recording etc. Descriptive categories at the object level, e.g. mime type, interview characteristics, interview setting Relationship to another data object or part of data Capacity to capture rich annotation of parts of data (e.g an extract) Based on published QuDEx model in use (Schema at: Object-level description = a lot of manual work! Limited use of TEI schema for mark-up of textual data items

User expectations Search/browse for data Browse Search:
Search /faceted browse of data - text; image/PDF, audio Browse Faceted browse by categories: Collection level, title, date and openess Collection object: data type, interview characteristics, location Search: Display no. hits and minimal item metadata Word in paragraph; thumbnail image/pdf; AV link Context: other related objects,within system or external Access full object View data, key metadata and all related files and links Get citation for part of data

System assumptions BaseX for metadata storage; Java loading; Solr search Data must be fully prepared on loading/publishing to the system. Data not ‘managed’ within the system Mark-up, metadata, relationships all pre-defined Pre-defined GUIDs to be used for citation (DOI + drilldown) Cannot search audio-visual data content Simple QuDEx metadata data entry tool created using SharePoint Technologies for user interface use existing in-house systems, .NET No download of data collection/subset - route to the UK Data Service Citation of selected extract of text; user-annotation possible

UK QualiBank Dataflow

Digitisation of key data sources
Selectively digitize paper-based materials: Original survey questionnaires Open ended questions Transcribed interviews Handwritten field notes, essays Diagrams Photographs Destination formats: All text files treated as XML Image files (photos and text) as PDF Audio as mp3

QuDEx collection level metadata

Objects in collection metadata

Object relationships Rich set of verbs available to define relationships between all objects Converse verbs generated automatically:

QuDEx Category Schemes

Use of Text Encoding Initiative (TEI)
Minimal use of TEI tags, of massive profile To denote structural mark-up Headers, turn takers, paragraphs Corrections, errors Use of unique GUIDs to identify all QuDEx IDs: Collection, Files, Paragraphs

School Leavers on the Isle of Sheppey

TEI XML: School Leavers on the Isle of Sheppey

Search interface - hits

Target page for an interview

Target references

Audio file target page

Citation mechanism System allows extract/quotation level citation; 1 or more consecutive paragraphs Citation object and citation format created on the fly – using GUIDS and system URI URI resolves directly to the data extract Some more sensitive collections are closed, so cannot resolve to data without login Is related to our collection-level DOIs e.g /UKDA-SN

Contact details Darren Bell dbell@essex.ac.uk
Louise Corti Agustina Martinez

Powerful access to qualitative data: What’s behind the UK QualiBank

Similar presentations

Presentation on theme: "Powerful access to qualitative data: What’s behind the UK QualiBank"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Powerful access to qualitative data: What’s behind the UK QualiBank

Similar presentations

Presentation on theme: "Powerful access to qualitative data: What’s behind the UK QualiBank"— Presentation transcript:

Similar presentations

About project

Feedback