Presentation is loading. Please wait.

Presentation is loading. Please wait.

Powerful access to qualitative data: What’s behind the UK QualiBank

Similar presentations


Presentation on theme: "Powerful access to qualitative data: What’s behind the UK QualiBank"— Presentation transcript:

1 Powerful access to qualitative data: What’s behind the UK QualiBank
Darren Bell Data & Services Developer UK Data Archive IASSIST, Toronto June 2014

2 QualiBank project: rationale & aims
Provide enhanced access to key qualitative data via online data browsing and exploration: UK QualiBank Based on existing metadata schemas and known technologies Offer a mechanism for reliably citing data located in the system Project includes large-scale digitisation of precious and undigitized materials Maximise the impact from existing research and resource investments – demonstrate re-use

3 UK Data Service and its own needs
We have one of the largest qualitative data collections– over 350 data collections A proportion of these have been digitised from older paper sources Currently users find and download these from our website Not so easy to find, but study documentation good No searching within collections No file manifest shown until download It can be a bit of guess work! Have Datacite DOIs; cannot reliably cite parts of data

4 Finding & accessing qualitative data
Search for “health” in our data catalogue, Discover Retrieve catalogue record, e.g. SN 6124: Being a Doctor: a Sociological Analysis, DDI 2.5 very limited for describing file content View limited user guide Web download as RTF bundle (46 transcripts)

5 Data listing

6 Download Zip of data and doc

7 Complex data collections
SN 5801: Concepts of Healthy Eating Food Research: Phases I and II, 293 interview transcripts; 73 diaries; 6 observation field notes Not represented well at all in a DDI 2.X catalogue

8 Metadata demands for UK QualiBank
Explore data through a data journey Find relevant extract, examine in context, cite Link data to still and moving images, and other related research outputs Some collections completely open Demands highly structured and consistently marked-up data Qualitative data requires object (file-level) descriptive metadata, e.g. interviews, audio-visual files, images Use of common metadata elements enable federated catalogues across providers and borders

9 Description below the collection
DDI 2.5 for catalogue metadata QuDEx schema for file level description: allows detailed identification of data objects: Interview transcript or audio recording etc. Descriptive categories at the object level, e.g. mime type, interview characteristics, interview setting Relationship to another data object or part of data Capacity to capture rich annotation of parts of data (e.g an extract) Based on published QuDEx model in use (Schema at: Object-level description = a lot of manual work! Limited use of TEI schema for mark-up of textual data items

10 User expectations Search/browse for data Browse Search:
Search /faceted browse of data - text; image/PDF, audio Browse Faceted browse by categories: Collection level, title, date and openess Collection object: data type, interview characteristics, location Search: Display no. hits and minimal item metadata Word in paragraph; thumbnail image/pdf; AV link Context: other related objects,within system or external Access full object View data, key metadata and all related files and links Get citation for part of data

11 System assumptions BaseX for metadata storage; Java loading; Solr search Data must be fully prepared on loading/publishing to the system. Data not ‘managed’ within the system Mark-up, metadata, relationships all pre-defined Pre-defined GUIDs to be used for citation (DOI + drilldown) Cannot search audio-visual data content Simple QuDEx metadata data entry tool created using SharePoint Technologies for user interface use existing in-house systems, .NET No download of data collection/subset - route to the UK Data Service Citation of selected extract of text; user-annotation possible

12 UK QualiBank Dataflow

13 Digitisation of key data sources
Selectively digitize paper-based materials: Original survey questionnaires Open ended questions Transcribed interviews Handwritten field notes, essays Diagrams Photographs Destination formats: All text files treated as XML Image files (photos and text) as PDF Audio as mp3

14 QuDEx collection level metadata

15 Objects in collection metadata

16 Object relationships Rich set of verbs available to define relationships between all objects Converse verbs generated automatically:

17 QuDEx Category Schemes

18 Use of Text Encoding Initiative (TEI)
Minimal use of TEI tags, of massive profile To denote structural mark-up Headers, turn takers, paragraphs Corrections, errors Use of unique GUIDs to identify all QuDEx IDs: Collection, Files, Paragraphs

19 School Leavers on the Isle of Sheppey

20 TEI XML: School Leavers on the Isle of Sheppey

21 Search interface - hits

22 Target page for an interview

23 Target references

24 Audio file target page

25 Citation mechanism System allows extract/quotation level citation; 1 or more consecutive paragraphs Citation object and citation format created on the fly – using GUIDS and system URI URI resolves directly to the data extract Some more sensitive collections are closed, so cannot resolve to data without login Is related to our collection-level DOIs e.g /UKDA-SN

26 Contact details Darren Bell dbell@essex.ac.uk
Louise Corti Agustina Martinez


Download ppt "Powerful access to qualitative data: What’s behind the UK QualiBank"

Similar presentations


Ads by Google