IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF-1115210).

Slides:



Advertisements
Similar presentations
Cornell University January 2015 Sponsored by Cornell Statistical Consulting Unit Instructors Emily Davenport (Cornell University) Erika Mudrak (CSCU) Jeramia.
Advertisements

IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Google Refine Tutorial April, Sathishwaran.R - 10BM60079 Vijaya Prabhu - 10BM60097 Vinod Gupta School of Management, IIT Kharagpur This Tutorial.
OpenRefine for Metadata Cleanup Danielle Cunniff Plumer dcplumer associates
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Text Editing Kim Shepherd Digital Development Team The University of Auckland Library Tools, tips, tricks LIANZA ITSIG webinar.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Data Cleaning, Validation and Enhancement iDigBio Wet Collections Digitization Workshop March 4 – 6, 2013 KU Biodiversity Institute, University of Kansas.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Open Data at the World Bank. Open Data at the World Bank Open about what we do Open about what we.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
 How can you harness the power of the Web to sharpen your Science Research skills?
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Cleaning Validating and Enhancing Data with Open Refine iDigBio – University of Florida 2 nd Train-the-Trainers Georeferencing Workshop Deborah Paul, iDigInfo.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton Collection Services Librarian Georgia State University.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Build a Free Website1 Build A Website For Free 2 ND Edition By Mark Bell.
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
Automated Georeferencing of Natural History Museum Data Nelson E. Rios Discussion The Tulane University Fish Collection, with 7.1 million fluid-preserved.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
University of Florida Florida State University
Data Wrangling and Interoperability Andrea Denton Research and Data Services Manager Claude Moore Health Sciences Library Ricky Patterson.
Raw Data Cleaning, Validation and Enhancement The Field Museum - Chicago, Illinois iDigBio Entomology Digitization Workshop Deborah Paul, iDigBio April.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. The IQ-Table & Collection Viewer A.
HDF Converting between HDF4 and HDF5 MuQun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University of Illinois,
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
How to import and export ADMS-AP conform metadata of interoperability solutions on Joinup 1.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Computer Aided Design By Brian Nettleton This material is based upon work supported by the National Science Foundation under Grant No Any opinions,
This material is based upon work supported by the National Science Foundation under Grant No. ANT Any opinions, findings, and conclusions or recommendations.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Mike Bolam Metadata Librarian Digital Scholarship Services University Library System //
Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.
GOOGLE FUSION TABLES: WEB- CENTERED DATA MANAGEMENT AND COLLABORATION HectorGonzalez, et al. Google Inc. Presented by Donald Cha December 2, 2015.
Data Organization Quality Assurance and Transformations.
Product Project Prototype Standards 2P, Q 9F, H This material is based upon work supported by the National Science Foundation under Grant No
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
Leader Interviews Name, PhD Title, Organization University This project is funded by the National Science Foundation (NSF) under award numbers ANT
1 James N. Bellinger University of Wisconsin-Madison 13-August-2010 Endcap Processing Notes James N. Bellinger 13-Aug-2010.
Google maps engine and language presentation Ibrahim Motala.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Cornell University June 2016 Sponsored by Cornell Statistical Consulting Unit Instructors Emily Davenport (Cornell University) Erika Mudrak (CSCU) Lynn.
Advanced BIML topics Be a W.I.S.E. A.S.S. Me ! Self-employed BI consultant Author Trainer MCT
Unit 2: Lesson 11 & 12 Making Data Visualizations
Data Cleaning with Open Refine:
The Binary Number System
Fearless Transformation: Applying OpenRefine to Digital Collections
Discussion and Conclusion
In Search of Useful Collection Metadata
Written by: Jennifer Doherty, Cornelia Harris, Laurel Hartley
Data Management: The Data Repatriation Re-integration Step or …
Unit 2: Lesson 11 & 12 Making Data Visualizations
Preparing your Data using Python
Nanotechnology & Society
Preparing your Data using Python
People Who Did the Study Universities they are affiliated with
USING OPENREFINE FOR DATA-DRIVEN DECISION-MAKING
Agenda About Excel/Calc Spreadsheets Key Features
Introduction to Matlab
The Life-Changing Magic of OpenRefine
Designing, Implementing, and Benefiting from a Collections Attribution Channel: the view from iDigBio and the ADBC Alex Thompson, Deborah L. Paul, Gil.
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Images from When whale I sea you again?
Presentation transcript:

iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. All images used with permission or are free from copyright. Got (messy) data? Clean it, enhance it, have fun with it Get Open Refine openrefine.org iDigBio Mobilizing Small Herbaria Digitization Workshop December 9 – 11, 2013 Tallahassee, Florida Deborah Paul, iDigInfo, iDigBio #smallherb

“You just saved me a month!” “I feel like I just came out of the dark ages.” “Our data ready now in 3 years, instead of 7.”

OpenRefine An open-source power tool for working with messy data. Got Data in a Spreadsheet,…? TSV, CSV, *SV, Excel (.xls and.xlsx), JSON, XML, RDF as XML, Wiki markup, and Google Data documents are all supported. the software tool formerly known as GoogleRefine

works on your own computer (Mac* or PC) small to medium-sized data sets (not millions)

What data issues can you find / fix? some data issues use open refine because unknown the unknown non-standard data dates people places languages countries typos filename issues taxonomic errors identifier / guid error mapping to standard terms formatting missing data missing metadata inspect dataset to find / fix errors inspect to reveal patterns easy to repeat steps easy to undo changes easy to enhance data

What to expect? clustering algorithms fingerprint, soundex, levenshtein, … scatter plots excel, csv, xml, … scripts (for repetitive cleaning) created automatically created automatically save to re-use regex (you can do this!) enhance data call a service (what?) call GEOLocate (just like Georeference Me!) reconcile names again a taxonomic name service

Ready?, Demo anyone?... Tutorial Cleaning, Validating, and Enhancing Data with Open RefineCleaning, Validating, and Enhancing Data with Open Refine and sample CSV dataset for above tutorialCSV OpenRefine videos and tutorials Join Google+ Open Refine CommunityGoogle+ Open Refine Community Google Fusion Tables Check out Google Fusion Tables too! Teach others about these power tools Pay-it-forward! Data that is “fit-for-research-use” & fun

Happy Holidays 2013! Happy Parsing, Sorting,Faceting, Enhancing, … an easy gift idea too!