Download presentation
Presentation is loading. Please wait.
Published byGerard Ralph Floyd Modified over 9 years ago
1
Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton tclayton3@gsu.edu Collection Services Librarian Georgia State University Library
2
Main Functions Explore Extend & Reconcile Clean & Transform
3
Getting OpenRefine Download at http://openrefine.orghttp://openrefine.org Platform independent - based on the Java environment Google Refine 2.5latest stable version OpenRefine 2.6development version
4
Comparison to other tools OpenRefine Can batch edit rows and columns Excellent for exploring & transforming data No schema needed Data is always visible Spreadsheets Edit one cell at a time Excellent for data entry, functions, calculations No schema needed Data is always visible Databases Schema and scripting language needed for editing Data is mostly out of site unless programming is used to run queries or build views
5
Getting help The OpenRefine wiki is housed on GitHub: https://github.com/OpenRefine/OpenRefine/wiki - includes installation instructions, documentation, tutorials, recipes, etc. Using OpenRefine by Ruben Verborgh and Max De Wilde, 2013
6
Getting started (on Windows) Download the.zip file Extract to a folder of your choosing Click the.exe file to run The Command window opens and will run in the background [Ctrl-C in this window safely exits OpenRefine]
7
Runs in your default browser http://localhost:3333http://localhost:3333 or http://127.0.0.1:3333http://127.0.0.1:3333
8
Create project Create a new project, Open an existing one, Or import from another OpenRefine instance. Supported file formats include: TSV, CSV, *SV, Excel (.xls and.xlsx),JSON, XML, RDF as XML, and Google docs
9
Create project Name the project Edit import options if necessary; options vary by file type.
10
Basic navigation 1 3 2 4
12
The “All” column Contains some features that let you perform operations on all columns at once: - reorder - remove - collapse or expand View – Collapse/Expand columns Edit columns – Re-order/remove columns
13
The other columns Most operations in OpenRefine act on a single column, and are initiated from that column’s menu. The “Edit column” dropdown menu contains options to rename or remove the column, and provides limited options for moving the column (to the beginning, end, or one over in either direction). The “View” dropdown provides additional collapsing options
15
Project history: Undo / Redo undo some (or all) of your project extract/save parts of your project history apply (import) steps from another project
16
Export options Export Menu
17
Explore your data OpenRefine offers multiple ways to facet your data: – text – number – timeline – blank – error – and more! Demo: http://www.screencast.com/t/h1v130ltDl Image source: International Space Station Above Earth, by NASA, https://www.flickr.com/photos/nasamarshall/9070896398/, (CC BY-NC 2.0)https://www.flickr.com/photos/nasamarshall/9070896398/(CC BY-NC 2.0)
18
Filtering Text filtering matches cells that contain a string or regular expression.
19
Sorting Sorting in OpenRefine is somewhat special… Demo: http://www.screencast.com /t/mEUVANxYz Image source: Lego Sorting, by jwhittenburg, https://www.flickr.com/photos/jaydubya_rulez/207321782/, (CC-BY-NC-ND 2.0).https://www.flickr.com/photos/jaydubya_rulez/207321782/(CC-BY-NC-ND 2.0)
20
Blank down / Fill down
21
Rows vs. records
22
Clean & transform General transformation tips: Think in patterns – what are the common characteristics of the cells/rows/columns you want to change Use facets and filters to isolate – then use a single command to change the set
23
Common transforms
24
Splitting cells & transposing Problem: You used the TITLE field from the BIB_TEXT table in your Voyager Access query; now you want to separate the title and author information. Solution: Use some of the Edit Cells and Transpose options. http://www.screencast.com/t/AxX3pOg6U original cell format after splitting multi- value cells after transposing cells in rows into columns
25
Splitting columns ILLiad LDAP conversion project – deriving campus IDs from patron email addresses: ILLiad user data Edit column menu
26
Splitting columns
27
Clustering is magical Publisher data in Voyager can be messy. This video shows how clustering can be used to merge variations of the same publisher together. http://screencast.com/t/dMYQsusXj
28
GREL ^ is the symbol for starts with
29
Transforming with GREL Menu OptionResult Edit cells: Transform…The regular expression transforms the cells in active column Edit column: Add column based on this column The regular expression is run against the active column, but creates a new column
30
GREL
31
GREL: replacing The preview shows that the “c” and the “.” have been replaced with “nothing.” The first set of ““ contains the string to replace; the 2 nd set contains what to replace it with. This is two expressions chained together, not one. They are combined with the period that precedes the 2 nd “replace.”
32
History and favorites The History tab stores expressions used previously in current AND other projects. The Starred tab stores those you have marked as favorites.
33
A couple favorites The cell.cross function pulls data from one project into another (based on a matching column – ISSN, BibID, title, etc.): syntax: cell.cross("Name of the source project", "name of the reference column").cells["Name of the column you want to import"].value[0] example – you’re working with the title list for your Wiley package renewal - from the column containing the ISSN info, add a new column using the following expression – it matches against the ISSN column in the Wiley COUNTER report, and pulls in the fulltext downloads: cell.cross("Wiley 2013 JR1", “Print ISSN").cells[“Reporting Period Total"].value[0] Transform display call numbers into a normalized call numbers: 1) Remove periodsvalue.replace(".", "") 2) Separate letter groups followed by numbers (with a space) value.replace(/(\p{IsAlphabetic})(?=\d)/,'$1 ') 3) Separate number groups followed by lettersvalue.replace(/(\d)(?=[A-Z])/,'$1 ')
34
Extend & reconcile Image source: Map of the OpenRefine Ecosystem, by Martin Magdinier, @magdmartin, http://openrefine.org/2015/01/26/Mapping-OpenRefine-ecosystem.html
35
Questions? Additional image credits: Broom icon, By Alberto Guerra Quintanilla, from the Noun Project, https://thenounproject.com/term/broom/30688/, (CC BY 3.0 US). https://thenounproject.com/term/broom/30688/(CC BY 3.0 US) Bucket icon, By Alberto Guerra Quintanilla, from the Noun Project, https://thenounproject.com/term/bucket/30690/, (CC BY 3.0 US). https://thenounproject.com/term/bucket/30690/(CC BY 3.0 US) Bullfighting icon, By Paulo Volkova, from the Noun Project, https://thenounproject.com/term/bullfighting/3835/https://thenounproject.com/term/bullfighting/3835/, public domain. Magic-Wand icon, By Mister Pixel, from the Noun Project, https://thenounproject.com/term/magic-wand/34626/https://thenounproject.com/term/magic-wand/34626/, (CC BY 3.0 US).(CC BY 3.0 US)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.