Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton Collection Services Librarian Georgia State University.

Similar presentations


Presentation on theme: "Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton Collection Services Librarian Georgia State University."— Presentation transcript:

1 Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton tclayton3@gsu.edu Collection Services Librarian Georgia State University Library

2 Main Functions Explore Extend & Reconcile Clean & Transform

3 Getting OpenRefine Download at http://openrefine.orghttp://openrefine.org Platform independent - based on the Java environment Google Refine 2.5latest stable version OpenRefine 2.6development version

4 Comparison to other tools OpenRefine Can batch edit rows and columns Excellent for exploring & transforming data No schema needed Data is always visible Spreadsheets Edit one cell at a time Excellent for data entry, functions, calculations No schema needed Data is always visible Databases Schema and scripting language needed for editing Data is mostly out of site unless programming is used to run queries or build views

5 Getting help The OpenRefine wiki is housed on GitHub: https://github.com/OpenRefine/OpenRefine/wiki - includes installation instructions, documentation, tutorials, recipes, etc. Using OpenRefine by Ruben Verborgh and Max De Wilde, 2013

6 Getting started (on Windows) Download the.zip file Extract to a folder of your choosing Click the.exe file to run The Command window opens and will run in the background [Ctrl-C in this window safely exits OpenRefine]

7 Runs in your default browser http://localhost:3333http://localhost:3333 or http://127.0.0.1:3333http://127.0.0.1:3333

8 Create project Create a new project, Open an existing one, Or import from another OpenRefine instance. Supported file formats include: TSV, CSV, *SV, Excel (.xls and.xlsx),JSON, XML, RDF as XML, and Google docs

9 Create project Name the project Edit import options if necessary; options vary by file type.

10 Basic navigation 1 3 2 4

11

12 The “All” column Contains some features that let you perform operations on all columns at once: - reorder - remove - collapse or expand View – Collapse/Expand columns Edit columns – Re-order/remove columns

13 The other columns Most operations in OpenRefine act on a single column, and are initiated from that column’s menu. The “Edit column” dropdown menu contains options to rename or remove the column, and provides limited options for moving the column (to the beginning, end, or one over in either direction). The “View” dropdown provides additional collapsing options

14

15 Project history: Undo / Redo undo some (or all) of your project extract/save parts of your project history apply (import) steps from another project

16 Export options Export Menu

17 Explore your data OpenRefine offers multiple ways to facet your data: – text – number – timeline – blank – error – and more! Demo: http://www.screencast.com/t/h1v130ltDl Image source: International Space Station Above Earth, by NASA, https://www.flickr.com/photos/nasamarshall/9070896398/, (CC BY-NC 2.0)https://www.flickr.com/photos/nasamarshall/9070896398/(CC BY-NC 2.0)

18 Filtering Text filtering matches cells that contain a string or regular expression.

19 Sorting Sorting in OpenRefine is somewhat special… Demo: http://www.screencast.com /t/mEUVANxYz Image source: Lego Sorting, by jwhittenburg, https://www.flickr.com/photos/jaydubya_rulez/207321782/, (CC-BY-NC-ND 2.0).https://www.flickr.com/photos/jaydubya_rulez/207321782/(CC-BY-NC-ND 2.0)

20 Blank down / Fill down

21 Rows vs. records

22 Clean & transform General transformation tips: Think in patterns – what are the common characteristics of the cells/rows/columns you want to change Use facets and filters to isolate – then use a single command to change the set

23 Common transforms

24 Splitting cells & transposing Problem: You used the TITLE field from the BIB_TEXT table in your Voyager Access query; now you want to separate the title and author information. Solution: Use some of the Edit Cells and Transpose options. http://www.screencast.com/t/AxX3pOg6U original cell format after splitting multi- value cells after transposing cells in rows into columns

25 Splitting columns ILLiad LDAP conversion project – deriving campus IDs from patron email addresses: ILLiad user data Edit column menu

26 Splitting columns

27 Clustering is magical Publisher data in Voyager can be messy. This video shows how clustering can be used to merge variations of the same publisher together. http://screencast.com/t/dMYQsusXj

28 GREL ^ is the symbol for starts with

29 Transforming with GREL Menu OptionResult Edit cells: Transform…The regular expression transforms the cells in active column Edit column: Add column based on this column The regular expression is run against the active column, but creates a new column

30 GREL

31 GREL: replacing The preview shows that the “c” and the “.” have been replaced with “nothing.” The first set of ““ contains the string to replace; the 2 nd set contains what to replace it with. This is two expressions chained together, not one. They are combined with the period that precedes the 2 nd “replace.”

32 History and favorites The History tab stores expressions used previously in current AND other projects. The Starred tab stores those you have marked as favorites.

33 A couple favorites The cell.cross function pulls data from one project into another (based on a matching column – ISSN, BibID, title, etc.): syntax: cell.cross("Name of the source project", "name of the reference column").cells["Name of the column you want to import"].value[0] example – you’re working with the title list for your Wiley package renewal - from the column containing the ISSN info, add a new column using the following expression – it matches against the ISSN column in the Wiley COUNTER report, and pulls in the fulltext downloads: cell.cross("Wiley 2013 JR1", “Print ISSN").cells[“Reporting Period Total"].value[0] Transform display call numbers into a normalized call numbers: 1) Remove periodsvalue.replace(".", "") 2) Separate letter groups followed by numbers (with a space) value.replace(/(\p{IsAlphabetic})(?=\d)/,'$1 ') 3) Separate number groups followed by lettersvalue.replace(/(\d)(?=[A-Z])/,'$1 ')

34 Extend & reconcile Image source: Map of the OpenRefine Ecosystem, by Martin Magdinier, @magdmartin, http://openrefine.org/2015/01/26/Mapping-OpenRefine-ecosystem.html

35 Questions? Additional image credits: Broom icon, By Alberto Guerra Quintanilla, from the Noun Project, https://thenounproject.com/term/broom/30688/, (CC BY 3.0 US). https://thenounproject.com/term/broom/30688/(CC BY 3.0 US) Bucket icon, By Alberto Guerra Quintanilla, from the Noun Project, https://thenounproject.com/term/bucket/30690/, (CC BY 3.0 US). https://thenounproject.com/term/bucket/30690/(CC BY 3.0 US) Bullfighting icon, By Paulo Volkova, from the Noun Project, https://thenounproject.com/term/bullfighting/3835/https://thenounproject.com/term/bullfighting/3835/, public domain. Magic-Wand icon, By Mister Pixel, from the Noun Project, https://thenounproject.com/term/magic-wand/34626/https://thenounproject.com/term/magic-wand/34626/, (CC BY 3.0 US).(CC BY 3.0 US)


Download ppt "Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton Collection Services Librarian Georgia State University."

Similar presentations


Ads by Google