Download presentation
Presentation is loading. Please wait.
Published byDante Blincoe Modified over 10 years ago
2
Toro 1 EMu on a Diet
3
Yale campus
5
Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot Invertebrate Paleontology 300,000Lot Invertebrate Zoology 300,000Lot Mineralogy 35,000Individual Paleobotany 150,000Individual Scientific Instruments 2,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual 2.7 million database-able units => ~11 million items
6
Peabody Collections Functional Units Databased Anthropology 325,000 90 % Botany 350,000 1 % Entomology1,000,000 3 % Invertebrate Paleontology 300,000 60 % Invertebrate Zoology 300,000 25 % Mineralogy 35,000 85 % Paleobotany 150,000 60 % Scientific Instruments 2,000100 % Vertebrate Paleontology 125,000 60 % Vertebrate Zoology 185,000 95 % 990,000 of 2.7 million => 37 % overall
7
The four YPM buildings Peabody (YPM) Environmental Science Center (ESC) Geology / Geophysics (KGL) 175 Whitney (Anthropology)
8
VZ Kristof Zyskowski (Vert. Zool. - ESC) Greg Watkins-Colwell (Vert. Zool. - ESC)
9
HSI Shae Trewin (Scientific Instruments – KGL )
10
VP Mary Ann Turner (Vert. Paleo. – KGL / YPM)
11
ANT Maureen DaRos (Anthro. - YPM / 175 Whitney)
12
% Databased vs. Collection Size (in 1000s of items)
13
Botany Entomology Invertebrate Paleontology Invertebrate Zoology % Databased vs. Collection Size (in 1000s of items)
14
1991 Systems Office created & staffed Peabody Collections Approximate Digital Timeline
15
1991 Systems Office created & staffed 1992 Argus collections databasing initiative started Peabody Collections Approximate Digital Timeline
16
1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data Peabody Collections Approximate Digital Timeline
17
1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched Peabody Collections Approximate Digital Timeline
18
1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched 1998 Physical move of many collections begins 2002 Physical move of many collections ends Peabody Collections Approximate Digital Timeline
19
1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched 1998 Physical move of many collections begins 2002 Physical move of many collections ends 2003 Search for Argus successor commences 2003 Informatics Office created & staffed Peabody Collections Approximate Digital Timeline
20
1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched 1998 Physical move of many collections begins 2002 Physical move of many collections ends 2003 Search for Argus successor commences 2003 Informatics Office created & staffed 2004 KE EMu to succeed Argus, data migration begins 2005 Argus data migration ends, go-live in KE EMu Peabody Collections Approximate Digital Timeline
21
EMu migration in '05 (all disciplines went live simultaneously) Physical move in 98-'02 (primarily neontological disciplines) Big events
23
What do you do …
24
… when your EMu is out of shape & sluggish ?
25
What do you do … … when your EMu is out of shape & sluggish ?
32
The Peabody Museum Presents
33
What clued us in that we should put our EMu on a diet ? The Peabody Museum Presents
34
980 megabytes in Argus 10,400 megabytes in EMu Area of Server Occupied by Catalogue
35
? 980 megabytes in Argus 10,400 megabytes in EMu
36
Default EMu cron maintenance job schedule Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact
37
late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule
38
late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule
39
late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule
40
Three Fabulously Easy Steps !
41
1. The Legacy Data Burnoff ( best quick loss plan ever ! )
42
Three Fabulously Easy Steps ! 1. The Legacy Data Burnoff ( best quick loss plan ever ! ) 2. The Darwin Core Binge & Purge ( eat the big enchilada and still end up thin ! )
43
Three Fabulously Easy Steps ! 1. The Legacy Data Burnoff ( best quick loss plan ever ! ) 2. The Darwin Core Binge & Purge ( eat the big enchilada and still end up thin ! ) 3. The Validation Code SlimDing ( your Texpress metabolism is your friend ! )
44
1. The Legacy Data Burnoff Anatomy of the ecatalogue database File NameFunction ~/emu/data/ecatalogue/datathe actual data ~/emu/data/ecatalogue/recindexing (part) ~/emu/data/ecatalogue/segindexing (part) The combined size of these was 10.4 gb -- 4 gb in data and 3 gb in each of rec and seg 980 mB 10,400 mB
45
The ecatalogue database was a rate limiter typical EMu data directory 23 files, 2 subdirs
46
Closer Assessment of Legacy Data In 2005, we had initially adopted many of the existing formats for data elements from the USNMs EMu client, to allow for rapid development of the Peabodys modules by KE prior to migration -- Legacy Data fields were among them
47
Closer Assessment of Legacy Data In 2005, we had initially adopted many of the existing formats for data elements from the USNMs EMu client, to allow for rapid development of the Peabodys modules by KE prior to migration -- Legacy Data fields were among them
48
Closer Assessment of Legacy Data
49
sites – round 2 constant data lengthy prefixes
50
sites – round 2 data of temporary use in migration
51
catalogue – round 2 data rec seg
52
Repetitive scripting of texexport & texload jobs Conducted around a million updates of records Manually adjusted cron jobs to accommodate Did the work at night over six-month-long period Watched process closely to keep from filling server disks How did we do the LegacyData Burnoff in 2005 ?
53
Repetitive scripting of texexport & texload jobs Conducted around a million updates of records Manual;y adjusted nightly cron jobs to accommodate Did the work at night over six-month-long period Watched process closely to keep from filling server disks How did we do the LegacyData Burnoff in 2005 ?
54
ecatalogue data rec seg
55
Crunch 2 data rec seg delete nulls from AdmOriginalData ecatalogue
56
Crunch 3 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData ecatalogue
57
Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData ecatalogue
58
Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData ecatalogue Wow ! 55 % reduction !
59
2. The Darwin Core Binge & Purge Charles Darwin, 1809-1882
60
Natural History Metadata Standard DwC Affords interoperability of different database systems Widely used in collaborative informatics initiatives Circa 40-50 fields depending on particular version Directly analogous to the Dublin Core standard
65
Populate DwC fields at 3.2.02 upgrade in 2006… so what ?
66
IZ Department: total characters existing data 43,941,006
67
Populate DwC fields at 3.2.02 upgrade in 2006… so what ? IZ Department: total characters existing data 43,941,006 IZ Department: est. new DwC characters 20,000,000
68
Populate DwC fields at 3.2.02 upgrade in 2006… so what ? IZ Department: total characters existing data 43,941,006 IZ Department: est. new DwC characters 20,000,000 IZ Department: est. expansion factor 45 %
69
Were about to gain back most of the pounds we just lost in the Legacy Data Burnoff !
70
catalogue – round 2 data rec seg
71
catalogue – round 2 data rec seg action in ecollectionevents
72
catalogue – round 2 data rec seg action in eparties
73
catalogue – round 2 data rec seg action in ecatalogue
74
catalogue – round 2 data rec seg Before actions
75
catalogue – round 2 data rec seg After actions
77
ExtendedData
78
SummaryData
79
ExtendedData SummaryData ExtendedData field is a full duplication of IRN + SummaryData fields… delete the ExtendedData field, use SummaryData when in thumbnail mode on records
80
Populate DwC fields at 3.2.02 upgrade… so what ? IZ Department: total characters existing data 43,941,006 IZ Department: est. new DwC characters 20,000,000 IZ Department: est. expansion factor 45 %
81
Populate DwC fields at 3.2.02 upgrade… so what ? IZ Department: total characters modified data 43,707,277 IZ Department: total new DwC characters 22,358,461 IZ Department: actual expansion factor - 0.1 %
82
Populate DwC fields at 3.2.02 upgrade… so what ? IZ Department: total characters existing data 43,707,277 IZ Department: total new DwC characters 22,358,461 IZ Department: actual expansion factor - 0.1 % Some pain, but NO weight gain !
83
3. The Validation Code SlimDing Weve taken off the easiest pounds… any other fields to trim ? Some sneakily subversive texpress tricks
84
3. The Validation Code SlimDing Can history of query behavior by users help identify some EMu soft spots ?
85
3. The Validation Code SlimDing Can history of query behavior by users help identify some EMu soft spots ? If so, can we slip EMu a dynamic diet pill into its computer code ?
86
3. The Validation Code SlimDing Can history of query behavior by users help identify some EMu soft spots ? If so, can we slip EMu a dynamic diet pill into its computer code ? texadmin
87
…you make certain common types of changes to any record in any EMu module …and automatic changes then propagate via emuload to numerous records in linked modules …those linked modules can grow a lot and slow EMu significantly between maintenance runs EMu actions in the background you dont see
90
Why not harness EMus continuously ravenous appetite for pushing local copies of linked fields into remote modules… and put it to work slimming for us !
91
Why not harness EMus continuously ravenous appetite for pushing local copies of linked fields into remote modules… and put it to work slimming for us ! Need to first understand how different EMu queries work
92
Drag and Drop Query
93
checks the link field
94
Straight Text Entry Query instead checks a local copy of the SummaryData from the linked record that has been inserted into the catalogue
95
EMus audit log - gigantic activity trail How often do users employ these two very different query strategies, on what fields, and are there distinctly divergent patterns ?
96
catalogue audit In this one week sample, only 7 of 52 queries for accessions from inside the catalogue module used text queries, the other 45 were drag & drops
97
Of those 7 text queries, every one asked for a primary id number for the accession, or the numeric piece of that number, but not for any other type of data from within those accessions
98
Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s).
99
Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s). This is where we gain our SlimDing advantage !
100
Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s). This is where we gain our SlimDing advantage ! We dont need more than the primary id of the accession record in the local copy of the accession module data stored in the catalogue module.
101
Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s). This is where we gain our SlimDing advantage ! We dont need more than the primary id of the accession record in the local copy of the accession module data stored in the catalogue module. This pattern also held true for queries launched from the catalogue against the bibliography and loans modules !
103
Catalogue Database
107
Catalogue module lost another 19% of its bulk over a couple months !
108
Internal Movements Database
109
Internal movements dropped from 550 mbytes down to 200 mbytes… 65% reduction !
110
Internal Movements Database
112
late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule
113
Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Default EMu cron maintenance job schedule * * *
114
Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Default EMu cron maintenance job schedule * * *
115
Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Default EMu cron maintenance job schedule * * *
116
Quick backup
117
A Happy EMu Means Happy Campers
118
finis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.