Presentation is loading. Please wait.

Presentation is loading. Please wait.

Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.

Similar presentations


Presentation on theme: "Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality."— Presentation transcript:

1 Google Refine for Data Quality / Integrity

2 Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality / Integrity

3 Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality / Integrity

4 In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, - - extending it with web services, - and linking it to databases”

5 In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, - - extending it with web services, - and linking it to databases” …. and can be run in isolation

6 Installation Download zip file from http://code.google.com/p/google-refine/wiki/Downloads Extract file Run google-refine.exe

7 Features Clustering / Grouping use case :group taxon name and merge similar groups

8 Features Filtering use case : filter out records which do not have ‘museum’ / ‘university’ / ‘marine’ in data provider name

9 Features Data Exclusion use case : exclude records that have been faceted / filtered

10 Features Extending Data use case :add ISO country code column use case :add column(s) by parsing taxon name

11 Features Reconciling Data use case :retrieve associated names from ‘WORMS’

12 Features Save / Replay User Actions use case :extract scientific names from name labels

13 Features Build Extensions use case :BioVeL Extension - interaction with Taverna - add additional functionality specific to the BioVeL context (e.g ECAT Name Parser)

14 Future Possibilities remote server could be deployed as a remote server with the possibility to use shared resources (extensions, data, history actions)

15 Future Possibilities integration with existing applications, either as a module or using REST API calls

16 Future Possibilities central application which can be used to run scripts, call web services and even interact with software applications

17 Thanks Questions / Suggestions / Comments


Download ppt "Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality."

Similar presentations


Ads by Google