Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton Collection Services Librarian Georgia State University.

Slides:



Advertisements
Similar presentations
Intro to Access 2007 Lindsey Brewer CSSCR September 18, 2009.
Advertisements

Google Refine Tutorial April, Sathishwaran.R - 10BM60079 Vijaya Prabhu - 10BM60097 Vinod Gupta School of Management, IIT Kharagpur This Tutorial.
Microsoft PowerPoint 2013 An Overview.
XP New Perspectives on Microsoft Office Excel 2003 Tutorial 1 1 Microsoft Office Excel 2003 Tutorial 1 – Using Excel To Manage Data.
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
With Microsoft Access 2010© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 3 1 Microsoft Office Access 2003 Tutorial 3 – Querying a Database.
Access Tutorial 1 Creating a Database
XP 1 ﴀ New Perspectives on Microsoft Office 2003, Premium Edition Excel Tutorial 1 Microsoft Office Excel 2003 Tutorial 1 – Using Excel To Manage Data.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Querying a Database Microsoft Office Access 2003.
Chapter 1 Getting Started With Dreamweaver. Explore the Dreamweaver Workspace The Dreamweaver workspace is where you can find all the tools to create.
Pasewark & Pasewark 1 Access Lesson 6 Integrating Access Microsoft Office 2007: Introductory.
Tutorial 1 Creating a Database. Objectives Learn basic database concepts and terms Learn basic database concepts and terms Explore the Microsoft Access.
 Using Microsoft Expression Web you can: › Create Web pages and Web sites › Set what you site will look like as you design it › Add text, images, multimedia.
Excel 2007 Part (2) Dr. Susan Al Naqshbandi
A guide for UICET for using Wikispaces.  A wiki is a web page or collection of web pages that can be linked together as a website.  Wikis are often.
Unit J: Creating a Database Microsoft Office Illustrated Fundamentals.
Introduction to Access By Mary Ann Chaney and Alicia Harkleroad.
Lesson 28: Exploring Access Learning Objectives After studying this lesson, you will be able to:  Define database and key terms associated with.
Working with a Database
Self Guided Tour for Query V8.4 Basic Features. 2 This Self Guided Tour is meant as a review only for Query V8.4 Basic Features and not as a substitute.
10-1 aslkjdhfalskhjfgalsdkfhalskdhjfglaskdhjflaskdhjfglaksjdhflakshflaksdhjfglaksjhflaksjhf.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
Copyright © 2008 Pearson Prentice Hall. All rights reserved Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next.
XP New Perspectives on Integrating Microsoft Office XP Tutorial 2 1 Integrating Microsoft Office XP Tutorial 2 – Integrating Word, Excel, and Access.
Domain 3 Understanding the Adobe Dreamweaver CS5 Interface.
Key Applications Module Lesson 21 — Access Essentials
Lesson 12: Creating a Manual and Using Mail Merge.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 23 Getting Started with Access Essentials 1 Morrison / Wells / Ruffolo.
Lesson 1: Exploring Access Learning Objectives After studying this lesson, you will be able to: Start Access and identify elements of the application.
Support.ebsco.com Introduction to EBSCOhost Tutorial.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
1/62 Introduction to and Using MS Access Database Management and Analysis Yunho Song.
Introduction to EBSCOhost Tutorial support.ebsco.com.
UoS Libraries 2011 EndNote X5 - basic graduate session.
Chapter 1 Getting Started With Dreamweaver. Exploring the Dreamweaver Workspace The Dreamweaver workspace is where you can find all the tools to create.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 8 – Integrating a Database with a FrontPage.
Mike Bolam Metadata Librarian Digital Scholarship Services University Library System //
XP New Perspectives on Microsoft Access 2002 Tutorial 31 Microsoft Access 2002 Tutorial 3 – Querying a Database.
Chapter 3 Automating Your Work. It is frustrating when you have to type the same passage of text repeatedly. For example your name and address. Word includes.
INTRODUCTION TO ACCESS. OBJECTIVES  Define the terms field, record, table, relational database, primary key, and foreign key  Create a blank database.
Work with Tables and Database Records Lesson 3. NAVIGATING AMONG RECORDS Access users who prefer using the keyboard to navigate records can press keys.
Introducing Dreamweaver. Dreamweaver The web development application used to create web pages Part of the Adobe creative suite.
Web Site Development - Process of planning and creating a website.
Lesson 4: Querying a Database. 2 Learning Objectives After studying this lesson, you will be able to:  Create, save, and run select queries  Set query.
Chapter 28. Copyright 2003, Paradigm Publishing Inc. CHAPTER 28 BACKNEXTEND 28-2 LINKS TO OBJECTIVES Table Calculations Table Properties Fields in a Table.
CLEANING UP MESSY DATA WITH OPEN REFINE Presented by Anjum Najmi & Spencer Keralis.
1 Introduction To Datatel Colleague USER INTERFACE 4.2 West Valley-Mission Community College District.
Using OpenRefine in Digital Collections: the Spencer Sheet Music Project Bruce J. Evans Cataloging & Metadata Unit Leader/Music and Fine Arts Catalog Librarian.
Lesson 17 Mail Merge. Overview Create a main document. Create a data source. Insert merge fields into a main document. Perform a mail merge. Use data.
XP Creating Web Pages with Microsoft Office
Tutorial 1 Creating a Database
Performing Mail Merges
Fearless Transformation: Applying OpenRefine to Digital Collections
Page Layout Header & Footer Font Styles Image wrapping List Styles
Central Document Library Quick Reference User Guide View User Guide
MS-Office It is a Software Package It contains some programs like
B2B Portal Training Materials
Learning about Taxes with Intuit ProFile
Introduction to EBSCOhost
Learning about Taxes with Intuit ProFile
B2B Portal Training Materials
RSA 2019, Toronto Preconference day March 16, AM-1PM
Tutorial 8 Sharing, Integrating, and Analyzing Data
Presentation transcript:

Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton Collection Services Librarian Georgia State University Library

Main Functions Explore Extend & Reconcile Clean & Transform

Getting OpenRefine Download at Platform independent - based on the Java environment Google Refine 2.5latest stable version OpenRefine 2.6development version

Comparison to other tools OpenRefine Can batch edit rows and columns Excellent for exploring & transforming data No schema needed Data is always visible Spreadsheets Edit one cell at a time Excellent for data entry, functions, calculations No schema needed Data is always visible Databases Schema and scripting language needed for editing Data is mostly out of site unless programming is used to run queries or build views

Getting help The OpenRefine wiki is housed on GitHub: - includes installation instructions, documentation, tutorials, recipes, etc. Using OpenRefine by Ruben Verborgh and Max De Wilde, 2013

Getting started (on Windows) Download the.zip file Extract to a folder of your choosing Click the.exe file to run The Command window opens and will run in the background [Ctrl-C in this window safely exits OpenRefine]

Runs in your default browser or

Create project Create a new project, Open an existing one, Or import from another OpenRefine instance. Supported file formats include: TSV, CSV, *SV, Excel (.xls and.xlsx),JSON, XML, RDF as XML, and Google docs

Create project Name the project Edit import options if necessary; options vary by file type.

Basic navigation

The “All” column Contains some features that let you perform operations on all columns at once: - reorder - remove - collapse or expand View – Collapse/Expand columns Edit columns – Re-order/remove columns

The other columns Most operations in OpenRefine act on a single column, and are initiated from that column’s menu. The “Edit column” dropdown menu contains options to rename or remove the column, and provides limited options for moving the column (to the beginning, end, or one over in either direction). The “View” dropdown provides additional collapsing options

Project history: Undo / Redo undo some (or all) of your project extract/save parts of your project history apply (import) steps from another project

Export options Export Menu

Explore your data OpenRefine offers multiple ways to facet your data: – text – number – timeline – blank – error – and more! Demo: Image source: International Space Station Above Earth, by NASA, (CC BY-NC 2.0) BY-NC 2.0)

Filtering Text filtering matches cells that contain a string or regular expression.

Sorting Sorting in OpenRefine is somewhat special… Demo: /t/mEUVANxYz Image source: Lego Sorting, by jwhittenburg, (CC-BY-NC-ND 2.0). 2.0)

Blank down / Fill down

Rows vs. records

Clean & transform General transformation tips: Think in patterns – what are the common characteristics of the cells/rows/columns you want to change Use facets and filters to isolate – then use a single command to change the set

Common transforms

Splitting cells & transposing Problem: You used the TITLE field from the BIB_TEXT table in your Voyager Access query; now you want to separate the title and author information. Solution: Use some of the Edit Cells and Transpose options. original cell format after splitting multi- value cells after transposing cells in rows into columns

Splitting columns ILLiad LDAP conversion project – deriving campus IDs from patron addresses: ILLiad user data Edit column menu

Splitting columns

Clustering is magical Publisher data in Voyager can be messy. This video shows how clustering can be used to merge variations of the same publisher together.

GREL ^ is the symbol for starts with

Transforming with GREL Menu OptionResult Edit cells: Transform…The regular expression transforms the cells in active column Edit column: Add column based on this column The regular expression is run against the active column, but creates a new column

GREL

GREL: replacing The preview shows that the “c” and the “.” have been replaced with “nothing.” The first set of ““ contains the string to replace; the 2 nd set contains what to replace it with. This is two expressions chained together, not one. They are combined with the period that precedes the 2 nd “replace.”

History and favorites The History tab stores expressions used previously in current AND other projects. The Starred tab stores those you have marked as favorites.

A couple favorites The cell.cross function pulls data from one project into another (based on a matching column – ISSN, BibID, title, etc.): syntax: cell.cross("Name of the source project", "name of the reference column").cells["Name of the column you want to import"].value[0] example – you’re working with the title list for your Wiley package renewal - from the column containing the ISSN info, add a new column using the following expression – it matches against the ISSN column in the Wiley COUNTER report, and pulls in the fulltext downloads: cell.cross("Wiley 2013 JR1", “Print ISSN").cells[“Reporting Period Total"].value[0] Transform display call numbers into a normalized call numbers: 1) Remove periodsvalue.replace(".", "") 2) Separate letter groups followed by numbers (with a space) value.replace(/(\p{IsAlphabetic})(?=\d)/,'$1 ') 3) Separate number groups followed by lettersvalue.replace(/(\d)(?=[A-Z])/,'$1 ')

Extend & reconcile Image source: Map of the OpenRefine Ecosystem, by Martin

Questions? Additional image credits: Broom icon, By Alberto Guerra Quintanilla, from the Noun Project, (CC BY 3.0 US). BY 3.0 US) Bucket icon, By Alberto Guerra Quintanilla, from the Noun Project, (CC BY 3.0 US). BY 3.0 US) Bullfighting icon, By Paulo Volkova, from the Noun Project, public domain. Magic-Wand icon, By Mister Pixel, from the Noun Project, (CC BY 3.0 US).(CC BY 3.0 US)