I / O: Care & Feeding of Your EMu Larry Gall Computer Systems Office Peabody Museum of Natural History Yale University.

Slides:



Advertisements
Similar presentations
JQuery MessageBoard. Lets use jQuery and AJAX in combination with a database to update and retrieve information without refreshing the page. Here we will.
Advertisements

EMu New Features 2013 Bernard Marshall KE Software.
Workflows, Requests, Tasks and EMu Mark Bradley National Gallery of Australia.
Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Database Ed Milne. Theme An introduction to databases Using the Base component of LibreOffice LibreOffice.
MvCIS - Forbes Hawkins – Copyright © 2004 Museum Victoria Forbes Hawkins Collection Systems Developer Museum Victoria - Melbourne, Australia Museum Victoria.
With Folder HelpDesk for Outlook, support centres and other helpdesks can work efficiently with support cases inside Microsoft Outlook. The support tickets.
HOW TO IMPORT AND EXPORT DATA. Why do I need to use Export/Import? Here are some examples: If you have a laptop that you take home (or have a home version.
Big Data Working with Terabytes in SQL Server Andrew Novick
EMu in extremis Larry Gall Peabody Museum of Natural History Yale University.
Toro 1 EMu Hacking at the Peabody Museum. Yale campus.
The Caught and Coloured website: its EMu origins Alex Chubaty – Collection Information Systems Craig Churchill – IT Software Development Museum Victoria.
NYBG + KE EMu The New York Botanical Garden + KE EMu Melissa Tulig Botanical Information Management.
Best Practices for Managing & Motivating the Digitizers Larry Gall Computer Systems Office Yale Peabody Museum of Natural History.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
Constructing the Memories Creating a Digital Collection Linda J. White, Digital Project Coordinator.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 10 Managing a Database.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
EMu and Fotoware: Integrating the EMu Collections Management Program with Image Management Software - Dr. Lance Wilkie, EMu Unit, Australian Museum.
Collections Management Museums Reporting in KE EMu.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
GETTING BUTTS INTO THE SEATS. SOCIAL MEDIA FACTS As of tomorrow Facebook will be 10 years old and has an estimated 1.3 BILLION users Facebook StatisticsData.
DEMONSTRATION FOR SIGMA DATA ACQUISITION MODULES Tempatron Ltd Data Measurements Division Darwin Close Reading RG2 0TB UK T : +44 (0) F :
Collections Management Museums EMu 3.1 / 3.2 – New Features EMu 3.1 / 3.2 New Features Bernard Marshall Chief Technology Officer KE Software.
Putting it all together for Digital Assets Jon Morley Beck Locey.
1 Working with MS SQL Server. 2 Objectives You will be able to Use Visual Studio for GUI based interactive access to a Microsoft SQL Server database.
New Tools to Increase Sales And to Enhance The User Experience.
MARC 10.5 Update John Harvey. MARC 10.5 Changes  Backup Scripts restructured  Added a script to generate scripts outside of MARC  Generate Scripts.
Black Box Larry Gall -- Peabody Museum of Natural History.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
Chapter 16 Designing Effective Output. E – 2 Before H000 Produce Hardware Investment Report HI000 Produce Hardware Investment Lines H100 Read Hardware.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
1 In the good old days... Years ago… the WWW was made up of (mostly) static documents. –Each URL corresponded to a single file stored on some hard disk.
Sight Words.
Technology vocabulary slides assignment. Application Definition : A program or group of programs designed for end users. Application software can be divided.
Dam It ! Larry Gall Peabody Museum of Natural History Yale University.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Tired of Spam? The solution is MailWasher
Get your hands dirty cleaning data European EMu Users Meeting, 3rd June. - Elizabeth Bruton, Museum of the History of Science, Oxford
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
Now, please open your book to page 60, and let’s talk about chapter 9: How Data is Stored.
For brownies this PowerPoint will help you understand computer viruses and help stop them!!!!
Access Forms and Queries. Entering Data in Your Table  You can add data to your table in Datasheet view, by typing in the columns and rows.  This.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Introduction to EBSCOhost Tutorial support.ebsco.com.
Collections Management Museums EMu Searching EMu Searching Explained (What’s going on under the hood!) Bernard Marshall Chief Technical Officer KE Software.
Files Tutor: You will need ….
Data Migration Training Page 1 KE EMu Data Migration
Introduction to Computer Programming - Project 2 Intro to Digital Technology.
Evaluation Question 4? HOW DID YOU USE MEDIA TECHNOLOGIES IN THE CONSTRUCTION AND RESEARCH, PLANNING AND EVALUATION STAGES?
You Inherited a Database Now What? What you should immediately check and start monitoring for. Tim Radney, Senior DBA for a top 40 US Bank President of.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
Cleveland SQL Saturday Catch-All or Sometimes Queries
2016 Year End Best Practices
Introduction to Python
Library Access 24/7 Did you know that you can do research without actually coming to the KC Library on campus? You have access to our databases and ebooks:
UK Data Service Secure Lab
Database application MySQL Database and PhpMyAdmin
Chapter Ten Managing a Database.
Computer Security Fundamentals
BASIC PHP and MYSQL Edward S. Flores.
Library Access 24/7 Did you know that you can do research without actually coming to the KC Library on campus? You have access to our databases and ebooks:
Tutorial Introduction to support.ebsco.com.
Unit 9.3 Learning Objectives Review database access in code
Chapter 9 Database and Information Management.
Finding Magazine and Journal Articles in
To view, Enable Editing, select Slide Show, select From Beginning
Tutorial Introduction to help.ebsco.com.
Presentation transcript:

I / O: Care & Feeding of Your EMu Larry Gall Computer Systems Office Peabody Museum of Natural History Yale University

I / O: Care & Feeding of Your EMu

I / O: Care & Feeding of Your EMu

I / O: Care & Feeding of Your EMu

I / O: Care & Feeding of Your EMu

I / O: Care & Feeding of Your EMu predictive text?

I / O: Care & Feeding of Your EMu

I / O: Care & Feeding of Your EMu

I / O: Care & Feeding of Your EMu an I/O bottleneck

I / O: Care & Feeding of Your EMu an I/O bottleneck

I / O: Care & Feeding of Your EMu an I/O bottleneck

I / O: Care & Feeding of Your EMu

I / O: Care & Feeding of Your EMu

Brief Peabody I/O

EMus Expand Exponentially

Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late

Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife

Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife I I O

~14 million specimens

AnthropologyBotanyEntomology Invertebrate Paleontology Invertebrate Zoology Mineralogy & Meteoritics Paleobotany Scientific Instruments Vertebrate Paleontology Vertebrate Zoology

Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units

Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual > 80% > 50% < 50% Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units

Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual > 80% > 50% < 50% Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units 295,921 digital assets mostly JPG & TIF, variety of other MIME types

Brief Peabody I/O

EMus Expand Exponentially

New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially

New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially

New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially

New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially

New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially

New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially SLASH

Brief Peabody I/O EMus Expand Exponentially

Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late

porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late

porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late

porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late

porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late

porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late

porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late

porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late

slashing : Halloween

Jason

Leatherface

Chucky

Freddy Kruger

Freddy EMuger

Slash that EMu beast !

Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies

New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies Slashing EMu before It’s Too Late eparties AdmOriginalData

Slashing EMu before It’s Too Late eparties null data rows New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData

Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData

Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData Slashed by 31%

Slashing EMu before It’s Too Late eparties Slashed by 31% Freddie says why stop there ? New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData

sites – round 2 constant data Slashing EMu before It’s Too Late ecollectionevents

sites – round 2 lengthy labels Slashing EMu before It’s Too Late ecollectionevents

sites – round 2 prefixes for temporary use during migration Slashing EMu before It’s Too Late ecollectionevents

sites – round 2 Slashing EMu before It’s Too Late ecollectionevents

data rec seg ecatalogue Slashing EMu before It’s Too Late

Crunch 2 data rec seg delete nulls from AdmOriginalData Slashing EMu before It’s Too Late ecatalogue

Crunch 3 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData Slashing EMu before It’s Too Late ecatalogue

Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData Slashing EMu before It’s Too Late ecatalogue

Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData Slashed by 55% Slashing EMu before It’s Too Late ecatalogue

Slashing EMu before It’s Too Late allowed adding in Darwin Core data, with a net disk space reduction

Slashing EMu before It’s Too Late methodologies used during the first pass slashings

Slashing EMu before It’s Too Late methodologies used during the first pass slashings Boring, repetitive, nothing very fancy: Iterative server-side scripting (texexport, texload) Several million record updates were involved Manually tweaked nightly cron jobs to accommodate Conducted during evenings over a six month period Watched closely to avoid taxing server performance

Slashing EMu before It’s Too Late Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services methodologies used during the first pass slashings

Slashing EMu before It’s Too Late Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services rather brutish gladiator-style slashing, needs operator intervention

Slashing EMu before It’s Too Late how about more subtle slashing ?

Slashing EMu before It’s Too Late something a little bit more insidious, and automated

Slashing EMu before It’s Too Late something a little bit more insidious, and automated

Slashing EMu before It’s Too Late Nurse Ratched shots and pills

Slashing EMu before It’s Too Late shots and pills Nurse Ratched

Slashing EMu before It’s Too Late Nurse Ratched shots and pills

Slashing EMu before It’s Too Late

Nurse Ratched Nurse RatchEMu

Slashing EMu before It’s Too Late Nurse Ratched Nurse RatchEMu

catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late

catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late SummaryData

catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late SummaryData

catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late SummaryData

catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late Slashed by 29% SummaryData

catalogue – round 2 data rec seg SummaryData ExtendedData AFTER Slashing EMu before It’s Too Late Slashed by 29% SummaryData

catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late

texadmin – insert the slasher pills into validation segments

Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments

Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments emureindex: a Perl script in your ~emu/bin directory system(“texdesign –R $dbname /dev/null 2>&1”);

Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments emureindex: a Perl script in your ~emu/bin directory system(“texdesign –R $dbname /dev/null 2>&1”); slasher pills are reversible ! slasher pills work great on “visible” fields: (anything you see on screen and feel like slashing) slasher pills work great on “invisible” fields: (remote SummaryData strings copied from linked records)

Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments ecatalogue change Records:986,3611,557, % Disk use:10.4 gB6.3 gB-39.4% Record size:11.1 kB4.3 kB-61.8%

Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late

Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife

Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services

2014 every night: Compact maintenance gets run on all modules (1.4 hours) Cron-ed plain text data dumps for all modules (2.3 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services

2014 every night: Compact maintenance gets run on all modules (1.4 hours) Cron-ed plain text data dumps for all modules (2.3 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services 1. Pushing newly created multimedia files to Yale DAM 2. Pushing metadata updates to extant multimedia files to Yale DAM 3. OAI-PMH record harvesting by Yale Cross Collections search 4. Updating archives fonds (EAD) in Yale Finding Aid Database

n=18

1. output of the command “texlist –s” 2. time to run compact maintenance on all modules 3. time to run compact maintenance on just catalogue

diff emureindex emureindex.ypm 288c288 < echo “ Compacting database...” --- > echo “ Compacting database... `/bin/date`” 301c301 < echo “ Reconfiguring database...” --- > echo “ Reconfiguring database... `/bin/date`” 1. output of the command “texlist –s” 2. time to run compact maintenance on all modules 3. time to run compact maintenance on just catalogue

Time to complete compact maintenance on all modules (hours)

Number of records (x) and disk occupancy of records (y) among 18 KE clients

156 million ~1 TB Number of records (x) and disk occupancy of records (y) among 18 KE clients

156 million ~1 TB 553 million records, 2.7 TB! * Number of records (x) and disk occupancy of records (y) among 18 KE clients

156 million ~1 TB 553 million records, 2.7 TB! * * 434 million are eaudit and estatistics, “only” 119 million for all other modules combined Number of records (x) and disk occupancy of records (y) among 18 KE clients

29 million ecatalogue, 320 gB Number of records (x) and disk occupancy of records (y) among 18 KE clients

Percent of records that are eaudit and estatistics among 18 KE clients

somewhat greater range of variability Number of records (x) and disk occupancy (y) for emultimedia among 18 KE clients

Slashed by 82% all EXIF / XMP metadata remains in image headers

Yale DAM infrastructure

Yale DAM infrastructure

Yale DAM infrastructure

Yale DAM infrastructure

Yale DAM infrastructure

Yale DAM infrastructure

ALT-TUD it Yale DAM infrastructure

ALT-TUD it Yale DAM infrastructure

emultimedia EMu records:32,252142,350295,921 EMu disk use:22 gB83 gB52 gB DAM disk use:n.a.125 gB14,336 gB Yale DAM infrastructure

Know thyself, and thine own EMu

Slash early, slash often

as has become traditional…

We saw this slide already, you say It’s a trio of hackers holding sway Out of Melbourne came a fightin’ (2) A text database known as Titan Which would morph into EMu one day

We saw this slide already, you say It’s a trio of hackers holding sway Out of Melbourne came a fightin’ (2) A text database known as Titan Which would morph into EMu one day

That brand EMu is used for many things Just Google it and see what that brings An assortment of oils and gels Practically anything that sells To calm dry skin, bad rashes, and stings

Peabody’s EMu morphs often on screen Through the years how many have you seen? Is Photoshopping like this a sign Of some maladay unfortunately mine Has my daughter Jen inherited this gene?

In fact, I’d gotten it directly from Jim My late grandfather, who would spout it on a whim At family occasions when we did gather Or in longhand letters when he’d rather Write his brother-in-law from Omaha named Slim

In horror movies they slash, scream, and maul Everything in their paths, big and small Yet Freddy and Chucky don’t seem so gritty When adorned on a pooch or a kitty Maybe that’s worse – I can’t say, your call

That Swedish connection was definitely clear When John Doolan was bending our ear KE staff and Abba merged together (2) In white satin, boots and leather Just like these EMus of pop fame and endear

That Swedish connection was definitely clear When John Doolan was bending our ear KE staff and Abba merged together (2) In white satin, boots and leather Just like these EMus of pop fame and endear

I/OI/O … ^ ^ I/O, I/O, it’s off to Axiell we go To a new computing frontier And there’s nothing to fear So they say, hope its so, Hope it’s so, I dunno

I/OI/O … ^ ^ In yonder eras Liza was a catch To her entourage young men would attach Oh, here’s another famous actor (2) A comedian, and no detractor Were I to say that these four are a match

I/OI/O … ^ ^ In yonder eras Liza was a catch To her entourage young men would attach Oh, here’s another famous actor (2) A comedian, and no detractor Were I to say that these four are a match

Now here is a fanciful sight KE staff dressed in royal delight Evening will be beckoning soon (2) And will bring laughs, drinks, and a tune Let's party at the reception tonight

Now here is a fanciful sight KE staff dressed in royal delight Evening will be beckoning soon (2) And will bring laughs, drinks, and a tune Let's party at the reception tonight

We've finally come to the end Of the doggerel, my fine feathered friend It was all I could do not to faint When revealed in body paint (2) Are Aussie EMus so gaudily penned

We've finally come to the end Of the doggerel, my fine feathered friend It was all I could do not to faint When revealed in body paint (2) Are Aussie EMus so gaudily penned