I / O: Care & Feeding of Your EMu Larry Gall Computer Systems Office Peabody Museum of Natural History Yale University
I / O: Care & Feeding of Your EMu
I / O: Care & Feeding of Your EMu
I / O: Care & Feeding of Your EMu
I / O: Care & Feeding of Your EMu
I / O: Care & Feeding of Your EMu predictive text?
I / O: Care & Feeding of Your EMu
I / O: Care & Feeding of Your EMu
I / O: Care & Feeding of Your EMu an I/O bottleneck
I / O: Care & Feeding of Your EMu an I/O bottleneck
I / O: Care & Feeding of Your EMu an I/O bottleneck
I / O: Care & Feeding of Your EMu
I / O: Care & Feeding of Your EMu
Brief Peabody I/O
EMus Expand Exponentially
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife I I O
~14 million specimens
AnthropologyBotanyEntomology Invertebrate Paleontology Invertebrate Zoology Mineralogy & Meteoritics Paleobotany Scientific Instruments Vertebrate Paleontology Vertebrate Zoology
Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units
Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual > 80% > 50% < 50% Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units
Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual > 80% > 50% < 50% Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units 295,921 digital assets mostly JPG & TIF, variety of other MIME types
Brief Peabody I/O
EMus Expand Exponentially
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially SLASH
Brief Peabody I/O EMus Expand Exponentially
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
slashing : Halloween
Jason
Leatherface
Chucky
Freddy Kruger
Freddy EMuger
Slash that EMu beast !
Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies Slashing EMu before It’s Too Late eparties AdmOriginalData
Slashing EMu before It’s Too Late eparties null data rows New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData
Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData
Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData Slashed by 31%
Slashing EMu before It’s Too Late eparties Slashed by 31% Freddie says why stop there ? New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData
sites – round 2 constant data Slashing EMu before It’s Too Late ecollectionevents
sites – round 2 lengthy labels Slashing EMu before It’s Too Late ecollectionevents
sites – round 2 prefixes for temporary use during migration Slashing EMu before It’s Too Late ecollectionevents
sites – round 2 Slashing EMu before It’s Too Late ecollectionevents
data rec seg ecatalogue Slashing EMu before It’s Too Late
Crunch 2 data rec seg delete nulls from AdmOriginalData Slashing EMu before It’s Too Late ecatalogue
Crunch 3 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData Slashing EMu before It’s Too Late ecatalogue
Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData Slashing EMu before It’s Too Late ecatalogue
Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData Slashed by 55% Slashing EMu before It’s Too Late ecatalogue
Slashing EMu before It’s Too Late allowed adding in Darwin Core data, with a net disk space reduction
Slashing EMu before It’s Too Late methodologies used during the first pass slashings
Slashing EMu before It’s Too Late methodologies used during the first pass slashings Boring, repetitive, nothing very fancy: Iterative server-side scripting (texexport, texload) Several million record updates were involved Manually tweaked nightly cron jobs to accommodate Conducted during evenings over a six month period Watched closely to avoid taxing server performance
Slashing EMu before It’s Too Late Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services methodologies used during the first pass slashings
Slashing EMu before It’s Too Late Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services rather brutish gladiator-style slashing, needs operator intervention
Slashing EMu before It’s Too Late how about more subtle slashing ?
Slashing EMu before It’s Too Late something a little bit more insidious, and automated
Slashing EMu before It’s Too Late something a little bit more insidious, and automated
Slashing EMu before It’s Too Late Nurse Ratched shots and pills
Slashing EMu before It’s Too Late shots and pills Nurse Ratched
Slashing EMu before It’s Too Late Nurse Ratched shots and pills
Slashing EMu before It’s Too Late
Nurse Ratched Nurse RatchEMu
Slashing EMu before It’s Too Late Nurse Ratched Nurse RatchEMu
catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late
catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late SummaryData
catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late SummaryData
catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late SummaryData
catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late Slashed by 29% SummaryData
catalogue – round 2 data rec seg SummaryData ExtendedData AFTER Slashing EMu before It’s Too Late Slashed by 29% SummaryData
catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late
texadmin – insert the slasher pills into validation segments
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments emureindex: a Perl script in your ~emu/bin directory system(“texdesign –R $dbname /dev/null 2>&1”);
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments emureindex: a Perl script in your ~emu/bin directory system(“texdesign –R $dbname /dev/null 2>&1”); slasher pills are reversible ! slasher pills work great on “visible” fields: (anything you see on screen and feel like slashing) slasher pills work great on “invisible” fields: (remote SummaryData strings copied from linked records)
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments ecatalogue change Records:986,3611,557, % Disk use:10.4 gB6.3 gB-39.4% Record size:11.1 kB4.3 kB-61.8%
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife
Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services
2014 every night: Compact maintenance gets run on all modules (1.4 hours) Cron-ed plain text data dumps for all modules (2.3 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services
2014 every night: Compact maintenance gets run on all modules (1.4 hours) Cron-ed plain text data dumps for all modules (2.3 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services 1. Pushing newly created multimedia files to Yale DAM 2. Pushing metadata updates to extant multimedia files to Yale DAM 3. OAI-PMH record harvesting by Yale Cross Collections search 4. Updating archives fonds (EAD) in Yale Finding Aid Database
n=18
1. output of the command “texlist –s” 2. time to run compact maintenance on all modules 3. time to run compact maintenance on just catalogue
diff emureindex emureindex.ypm 288c288 < echo “ Compacting database...” --- > echo “ Compacting database... `/bin/date`” 301c301 < echo “ Reconfiguring database...” --- > echo “ Reconfiguring database... `/bin/date`” 1. output of the command “texlist –s” 2. time to run compact maintenance on all modules 3. time to run compact maintenance on just catalogue
Time to complete compact maintenance on all modules (hours)
Number of records (x) and disk occupancy of records (y) among 18 KE clients
156 million ~1 TB Number of records (x) and disk occupancy of records (y) among 18 KE clients
156 million ~1 TB 553 million records, 2.7 TB! * Number of records (x) and disk occupancy of records (y) among 18 KE clients
156 million ~1 TB 553 million records, 2.7 TB! * * 434 million are eaudit and estatistics, “only” 119 million for all other modules combined Number of records (x) and disk occupancy of records (y) among 18 KE clients
29 million ecatalogue, 320 gB Number of records (x) and disk occupancy of records (y) among 18 KE clients
Percent of records that are eaudit and estatistics among 18 KE clients
somewhat greater range of variability Number of records (x) and disk occupancy (y) for emultimedia among 18 KE clients
Slashed by 82% all EXIF / XMP metadata remains in image headers
Yale DAM infrastructure
Yale DAM infrastructure
Yale DAM infrastructure
Yale DAM infrastructure
Yale DAM infrastructure
Yale DAM infrastructure
ALT-TUD it Yale DAM infrastructure
ALT-TUD it Yale DAM infrastructure
emultimedia EMu records:32,252142,350295,921 EMu disk use:22 gB83 gB52 gB DAM disk use:n.a.125 gB14,336 gB Yale DAM infrastructure
Know thyself, and thine own EMu
Slash early, slash often
as has become traditional…
We saw this slide already, you say It’s a trio of hackers holding sway Out of Melbourne came a fightin’ (2) A text database known as Titan Which would morph into EMu one day
We saw this slide already, you say It’s a trio of hackers holding sway Out of Melbourne came a fightin’ (2) A text database known as Titan Which would morph into EMu one day
That brand EMu is used for many things Just Google it and see what that brings An assortment of oils and gels Practically anything that sells To calm dry skin, bad rashes, and stings
Peabody’s EMu morphs often on screen Through the years how many have you seen? Is Photoshopping like this a sign Of some maladay unfortunately mine Has my daughter Jen inherited this gene?
In fact, I’d gotten it directly from Jim My late grandfather, who would spout it on a whim At family occasions when we did gather Or in longhand letters when he’d rather Write his brother-in-law from Omaha named Slim
In horror movies they slash, scream, and maul Everything in their paths, big and small Yet Freddy and Chucky don’t seem so gritty When adorned on a pooch or a kitty Maybe that’s worse – I can’t say, your call
That Swedish connection was definitely clear When John Doolan was bending our ear KE staff and Abba merged together (2) In white satin, boots and leather Just like these EMus of pop fame and endear
That Swedish connection was definitely clear When John Doolan was bending our ear KE staff and Abba merged together (2) In white satin, boots and leather Just like these EMus of pop fame and endear
I/OI/O … ^ ^ I/O, I/O, it’s off to Axiell we go To a new computing frontier And there’s nothing to fear So they say, hope its so, Hope it’s so, I dunno
I/OI/O … ^ ^ In yonder eras Liza was a catch To her entourage young men would attach Oh, here’s another famous actor (2) A comedian, and no detractor Were I to say that these four are a match
I/OI/O … ^ ^ In yonder eras Liza was a catch To her entourage young men would attach Oh, here’s another famous actor (2) A comedian, and no detractor Were I to say that these four are a match
Now here is a fanciful sight KE staff dressed in royal delight Evening will be beckoning soon (2) And will bring laughs, drinks, and a tune Let's party at the reception tonight
Now here is a fanciful sight KE staff dressed in royal delight Evening will be beckoning soon (2) And will bring laughs, drinks, and a tune Let's party at the reception tonight
We've finally come to the end Of the doggerel, my fine feathered friend It was all I could do not to faint When revealed in body paint (2) Are Aussie EMus so gaudily penned
We've finally come to the end Of the doggerel, my fine feathered friend It was all I could do not to faint When revealed in body paint (2) Are Aussie EMus so gaudily penned