Download presentation
Presentation is loading. Please wait.
Published bySandra Bains Modified over 9 years ago
2
I / O: Care & Feeding of Your EMu Larry Gall Computer Systems Office Peabody Museum of Natural History Yale University
3
I / O: Care & Feeding of Your EMu
4
I / O: Care & Feeding of Your EMu
5
I / O: Care & Feeding of Your EMu
6
I / O: Care & Feeding of Your EMu
7
I / O: Care & Feeding of Your EMu predictive text?
8
I / O: Care & Feeding of Your EMu
9
I / O: Care & Feeding of Your EMu
10
I / O: Care & Feeding of Your EMu an I/O bottleneck
11
I / O: Care & Feeding of Your EMu an I/O bottleneck
12
I / O: Care & Feeding of Your EMu an I/O bottleneck
13
I / O: Care & Feeding of Your EMu
14
I / O: Care & Feeding of Your EMu
18
Brief Peabody I/O
19
EMus Expand Exponentially
20
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late
21
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife
22
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife I I O
23
~14 million specimens
24
AnthropologyBotanyEntomology Invertebrate Paleontology Invertebrate Zoology Mineralogy & Meteoritics Paleobotany Scientific Instruments Vertebrate Paleontology Vertebrate Zoology
25
Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units
26
Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual > 80% > 50% < 50% Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units
27
Peabody Collections Current Digital Snapshot Anthropology 325,000Lot Botany 400,000Individual Entomology 450,000Lot / Individual Invertebrate Paleontology 350,000Lot Invertebrate Zoology 350,000Lot Mineralogy & Meteoritics 35,000Individual Paleobotany 150,000Individual Scientific Instruments 5,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual > 80% > 50% < 50% Items with an electronic record available (25 years effort): 64 % ~14 million items => ~2.7 million databaseable units 295,921 digital assets mostly JPG & TIF, variety of other MIME types
28
Brief Peabody I/O
29
EMus Expand Exponentially
36
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
37
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
38
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
39
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
40
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially
41
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module “Pork” may hide in plain sight SummaryData/ExtendedData AdmOriginalData remote SummaryData copies EMus Expand Exponentially SLASH
42
Brief Peabody I/O EMus Expand Exponentially
43
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late
44
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
45
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
46
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
47
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
48
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
49
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
50
porky EMus can be surly, and they will bite you Slashing EMu before It’s Too Late
51
slashing : Halloween
53
Jason
54
Leatherface
55
Chucky
56
Freddy Kruger
59
Freddy EMuger
60
Slash that EMu beast !
61
Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies
62
New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies Slashing EMu before It’s Too Late eparties AdmOriginalData
63
Slashing EMu before It’s Too Late eparties null data rows New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData
64
Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData
65
Slashing EMu before It’s Too Late eparties New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData Slashed by 31%
66
Slashing EMu before It’s Too Late eparties Slashed by 31% Freddie says why stop there ? New records are entered Existing records expand: more fields filled in more links established Records acquire new features: Darwin Core fields GUIDs (unique identifiers) New capabilities add records: Audit module Statistics module EMu beast hiding in plain sight: AdmOriginalData S ummaryData/ExtendedData remote SummaryData copies AdmOriginalData
67
sites – round 2 constant data Slashing EMu before It’s Too Late ecollectionevents
68
sites – round 2 lengthy labels Slashing EMu before It’s Too Late ecollectionevents
69
sites – round 2 prefixes for temporary use during migration Slashing EMu before It’s Too Late ecollectionevents
70
sites – round 2 Slashing EMu before It’s Too Late ecollectionevents
71
data rec seg ecatalogue Slashing EMu before It’s Too Late
72
Crunch 2 data rec seg delete nulls from AdmOriginalData Slashing EMu before It’s Too Late ecatalogue
73
Crunch 3 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData Slashing EMu before It’s Too Late ecatalogue
74
Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData Slashing EMu before It’s Too Late ecatalogue
75
Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData Slashed by 55% Slashing EMu before It’s Too Late ecatalogue
76
Slashing EMu before It’s Too Late allowed adding in Darwin Core data, with a net disk space reduction
77
Slashing EMu before It’s Too Late methodologies used during the first pass slashings
78
Slashing EMu before It’s Too Late methodologies used during the first pass slashings Boring, repetitive, nothing very fancy: Iterative server-side scripting (texexport, texload) Several million record updates were involved Manually tweaked nightly cron jobs to accommodate Conducted during evenings over a six month period Watched closely to avoid taxing server performance
79
Slashing EMu before It’s Too Late Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services methodologies used during the first pass slashings
80
Slashing EMu before It’s Too Late Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services rather brutish gladiator-style slashing, needs operator intervention
81
Slashing EMu before It’s Too Late how about more subtle slashing ?
82
Slashing EMu before It’s Too Late something a little bit more insidious, and automated
83
Slashing EMu before It’s Too Late something a little bit more insidious, and automated
84
Slashing EMu before It’s Too Late Nurse Ratched shots and pills
85
Slashing EMu before It’s Too Late shots and pills Nurse Ratched
86
Slashing EMu before It’s Too Late Nurse Ratched shots and pills
87
Slashing EMu before It’s Too Late
88
Nurse Ratched Nurse RatchEMu
89
Slashing EMu before It’s Too Late Nurse Ratched Nurse RatchEMu
90
catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late
91
catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late SummaryData
92
catalogue – round 2 data rec seg BEFORE Slashing EMu before It’s Too Late SummaryData
93
catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late SummaryData
94
catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late Slashed by 29% SummaryData
95
catalogue – round 2 data rec seg SummaryData ExtendedData AFTER Slashing EMu before It’s Too Late Slashed by 29% SummaryData
96
catalogue – round 2 data rec seg AFTER Slashing EMu before It’s Too Late
97
texadmin – insert the slasher pills into validation segments
98
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments
99
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments emureindex: a Perl script in your ~emu/bin directory system(“texdesign –R $dbname /dev/null 2>&1”);
100
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments emureindex: a Perl script in your ~emu/bin directory system(“texdesign –R $dbname /dev/null 2>&1”); slasher pills are reversible ! slasher pills work great on “visible” fields: (anything you see on screen and feel like slashing) slasher pills work great on “invisible” fields: (remote SummaryData strings copied from linked records)
101
Slashing EMu before It’s Too Late texadmin – insert the slasher pills into validation segments ecatalogue20072014change Records:986,3611,557,80857.9% Disk use:10.4 gB6.3 gB-39.4% Record size:11.1 kB4.3 kB-61.8%
102
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late
103
Brief Peabody I/O EMus Expand Exponentially Slashing EMu Before It’s Too Late Boost Your Performance/Nightlife
104
Now we could do the following every night: Compact maintenance gets run on all modules (3.5 hours) Cron-ed plain text data dumps for all modules (3.5 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services
105
2014 every night: Compact maintenance gets run on all modules (1.4 hours) Cron-ed plain text data dumps for all modules (2.3 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services
106
2014 every night: Compact maintenance gets run on all modules (1.4 hours) Cron-ed plain text data dumps for all modules (2.3 hours): generate small, portable gzipped backups of all EMu data fully reinstantiate SQL database feeding local search portal fully reinstantiate SQL database feeding DiGIR/IPT services 1. Pushing newly created multimedia files to Yale DAM 2. Pushing metadata updates to extant multimedia files to Yale DAM 3. OAI-PMH record harvesting by Yale Cross Collections search 4. Updating archives fonds (EAD) in Yale Finding Aid Database
108
n=18
110
1. output of the command “texlist –s” 2. time to run compact maintenance on all modules 3. time to run compact maintenance on just catalogue
111
emu-ypmnhlive@emu1[127] diff emureindex emureindex.ypm 288c288 < echo “ Compacting database...” --- > echo “ Compacting database... `/bin/date`” 301c301 < echo “ Reconfiguring database...” --- > echo “ Reconfiguring database... `/bin/date`” 1. output of the command “texlist –s” 2. time to run compact maintenance on all modules 3. time to run compact maintenance on just catalogue
112
Time to complete compact maintenance on all modules (hours)
114
Number of records (x) and disk occupancy of records (y) among 18 KE clients
115
156 million ~1 TB Number of records (x) and disk occupancy of records (y) among 18 KE clients
116
156 million ~1 TB 553 million records, 2.7 TB! * Number of records (x) and disk occupancy of records (y) among 18 KE clients
117
156 million ~1 TB 553 million records, 2.7 TB! * * 434 million are eaudit and estatistics, “only” 119 million for all other modules combined Number of records (x) and disk occupancy of records (y) among 18 KE clients
118
29 million ecatalogue, 320 gB Number of records (x) and disk occupancy of records (y) among 18 KE clients
119
Percent of records that are eaudit and estatistics among 18 KE clients
120
somewhat greater range of variability Number of records (x) and disk occupancy (y) for emultimedia among 18 KE clients
125
Slashed by 82% all EXIF / XMP metadata remains in image headers
126
Yale DAM infrastructure
127
Yale DAM infrastructure
128
Yale DAM infrastructure
129
Yale DAM infrastructure
130
Yale DAM infrastructure
131
Yale DAM infrastructure
132
ALT-TUD it Yale DAM infrastructure
133
ALT-TUD it Yale DAM infrastructure
134
emultimedia200720112014 EMu records:32,252142,350295,921 EMu disk use:22 gB83 gB52 gB DAM disk use:n.a.125 gB14,336 gB Yale DAM infrastructure
135
Know thyself, and thine own EMu
136
Slash early, slash often
137
as has become traditional…
138
We saw this slide already, you say It’s a trio of hackers holding sway Out of Melbourne came a fightin’ (2) A text database known as Titan Which would morph into EMu one day
139
We saw this slide already, you say It’s a trio of hackers holding sway Out of Melbourne came a fightin’ (2) A text database known as Titan Which would morph into EMu one day
140
That brand EMu is used for many things Just Google it and see what that brings An assortment of oils and gels Practically anything that sells To calm dry skin, bad rashes, and stings
141
Peabody’s EMu morphs often on screen Through the years how many have you seen? Is Photoshopping like this a sign Of some maladay unfortunately mine Has my daughter Jen inherited this gene?
142
In fact, I’d gotten it directly from Jim My late grandfather, who would spout it on a whim At family occasions when we did gather Or in longhand letters when he’d rather Write his brother-in-law from Omaha named Slim
143
In horror movies they slash, scream, and maul Everything in their paths, big and small Yet Freddy and Chucky don’t seem so gritty When adorned on a pooch or a kitty Maybe that’s worse – I can’t say, your call
144
That Swedish connection was definitely clear When John Doolan was bending our ear KE staff and Abba merged together (2) In white satin, boots and leather Just like these EMus of pop fame and endear
145
That Swedish connection was definitely clear When John Doolan was bending our ear KE staff and Abba merged together (2) In white satin, boots and leather Just like these EMus of pop fame and endear
147
I/OI/O … ^ ^ I/O, I/O, it’s off to Axiell we go To a new computing frontier And there’s nothing to fear So they say, hope its so, Hope it’s so, I dunno
148
I/OI/O … ^ ^ In yonder eras Liza was a catch To her entourage young men would attach Oh, here’s another famous actor (2) A comedian, and no detractor Were I to say that these four are a match
149
I/OI/O … ^ ^ In yonder eras Liza was a catch To her entourage young men would attach Oh, here’s another famous actor (2) A comedian, and no detractor Were I to say that these four are a match
150
Now here is a fanciful sight KE staff dressed in royal delight Evening will be beckoning soon (2) And will bring laughs, drinks, and a tune Let's party at the reception tonight
151
Now here is a fanciful sight KE staff dressed in royal delight Evening will be beckoning soon (2) And will bring laughs, drinks, and a tune Let's party at the reception tonight
152
We've finally come to the end Of the doggerel, my fine feathered friend It was all I could do not to faint When revealed in body paint (2) Are Aussie EMus so gaudily penned
153
We've finally come to the end Of the doggerel, my fine feathered friend It was all I could do not to faint When revealed in body paint (2) Are Aussie EMus so gaudily penned
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.