Download presentation
Presentation is loading. Please wait.
Published byFrauke Berger Modified over 5 years ago
1
RSA 2019, Toronto Preconference day March 16, 2019 11AM-1PM
Data Organization and visualization for beginners Jodi Cranston, Catherine Walsh, Angela Dressen
2
Programm 11-11:05 -- Introduction to the session and presenters
PRESENTATION OF PROJECTS 11:05-11:20 – Jodi: Mapping Titan, Mapping Paintings 11:20-11:35 – Catherine: Mapping Sculpture PRESENTATION OF TOOLS 11:35-12:05 – Angela: OpenRefine, TimelineJS 12:05-12:35 – Catherine: Palladio, CARTO Hands-on
3
OpenRefine
4
OpenRefine Cleaning up messy data from a spreadsheet Spelling errors
Uniform data Removing whitespace Splitting columns Enriching data from external sources Etc. You won’t be analysing your data one by one, but in groups and sets. Therefore the application is suitable for very large data sets.
5
OpenRefine Apart from cleaning data, you can also use Open Refine for different purposes Word counts in sets Combine sheets Enriching reconciled data with Open Refine: Import data from Wikidata or VIAF
6
OpenRefine Free, open source software
Works best with Google Chrome (less with Safari and Explorer) Written in Java. Requires Java JRE Works with Interactive Data Transformation tools (IDTs), which allows to change a big data set at one time. It is similar to a spreadsheet, but has more functionalities. Works as a destop application. It does not store your data. Save them! It may be used in several tabs contemporaneously. The .exe file opens a terminal window as web application, where the little server is running. It needs to remain open. Runs offline through the terminal window. Cleaning up your own accumulated data or data gathered from the net. Works with an algorithm.
7
OpenRefine Chose a project and upload it.
Rename project (save it later, Open Refine does not save or store automatically!!) Use code UTF-8 Configure your data: You will be shown a preview of your data. In the lower blue field, make sure “Parse data as” is set to “CSV / TSV / separator-based files”. Where it says character encoding, click in the blank field next to it and select UTF-8 from the pop-up window of encodings. Make sure the first row with your column headers is recognized as headers (boldfaced) and not as your data. If it is not automatically recognized, check the click box for “Parse next ‘1’ line(s) as column headers”. Since our exercise file is a CSV, activate the radio button “commas (CSV)” as the separator.
8
OpenRefine – basic clean up
Text facet -> cluster Get rid of whitespace: «Edit cells» -> «Common transforms» -> «Trim leading and trailing whitespace» / «Collapse consecutive whitespace» Divide columns: «Edit column» -> «Split into several columns…» Reorder columns Cluster: «Edit cells» -> «Cluster and edit…» (only works for entire clusters to be merged, no selection possible) Replace: Edit cells -> replace Undo/redo: step by step index in the menu Cancelling: Text facet –> chose what to eliminate and place a star –> back to facet by star –> true –> under all – facet by star –> remove all matching rows
9
OpenRefine - transform
Exchange values: Edit cells -> transform -> GREL language -> transform the value Replace: value.replace(‘xx’, ‘x’) Add characters to a column: “prefix” + value Cleaning up a date to show only the year: datePart(value,'year') GREL : General Refine Expression Language on GitHub
10
OpenRefine – example from Wikipedia – Italian artists
Download table from Wikipedia You want to separate names and years Add column based on this column Edit cells -> replace (to change the brakets into a colon, to be used later as idenfier) Edit column – split into several columns (use colon as identifier) Replace ) by null Value + «, « + cells(«mycell»).value Person separate: edit column – add column based on this column – value.split(« «)[1] 1= last name / 0= first name Add last name, first name together: value + «, « + cells[«Firstname»].value Another option: Split cells: Choose ‘Edit cells’, ‘Split multi-valued cells’, entering ‘|’ as the value separator.
11
OpenRefine for Data enrichment (using Linked Open Data)
Fetch URLs using Refine Contruct URL queries to retrieve information from a simple web API Using query services like: Wikidata Google maps API VIAF (Virtual International Authority File) etc.
12
Retrieving data from Wikidata
You need a column Wikidata_uri Create a column Wikidata_id: Edit column –> add column based on this column –> for the ID extraction enter value replace(value," "") On Wikidata_id column: Edit column -> add column by fetching URLs -> if you want to query birth dates enter value «P569» (" -> name column «date_of_birth_Wikidata». The result is in JSON. Clean data by -> edit cells -> transform -> for value enter forEach(value.parseJson().values,v,v).join(";") Cleaning up a date to show only the year: datePart(value,'year') Wikidata provides an endpoint for querying data as a URL. Once you know the property you would like to retrieve, the objective is to use OpenRefine to build a query string and retrieve the data you want from that endpoint.
13
Retrieving data from Wikidata
Reconcile (how simple is this!!) Chose source – Wikidata (in case include other columns too) Start reconciling – record will be automatically linked to Wikidata (some rest has to be done manually) Use values as identifiers
14
OpenRefine - export At the end: export your data set! (Open Refine does not change your original data set) Single column export -> facet -> chose facet -> export csv Full sheet export -> comma-separated value It is also possible to only export parts of your sheet.
15
OpenRefine tutorials http://openrefine.org/
Retrieving data from Wikidata or VIAF There are many more!!
16
Timeline JS
17
Timelines (selection)
Timeline JS (Northwestern University) (with examples and spreadsheet) Neatline – for Omeka Google Timeline Office Timelines (for Excel or Powerpoint)
18
TimelineJS With Google Chrome and Google Spreadsheets
Advantages Easy to use for a chronological visualization Incorporates maps and images from the web Can be incorporated into Websites and Powerpoints Disadvantages Limited interactivity Only uses images published on the web, not from own collection
19
TimelineJS With Google Chrome
Botticelli spreadsheet: Botticelli timeline (imbedded link to website or presentation)
20
Thank you ! Dr. Angela Dressen Villa I Tatti, The Harvard University Center for Italian Renaissance Studies / Florenz, Italy Discipline Representative for Digital Humanities at the Renaissance Society of America (RSA)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.