NLM Digital Repository Server Architecture January 18, 2011
Design Considerations Consistency with NLM architecture and processes Consistency with NLM architecture and processes Remove single points of failure Remove single points of failure Data redundancy for preservation Data redundancy for preservation Availability Availability Scalability Scalability Ingest ease, speed Ingest ease, speed 2
3 Single Server Architecture NWU BookViewer Flash Video Player with Search Muradora 1.4b Fedora Solr GSearch OS: CentOS HW: virtual server, 3 CPU, 24 GB RAM Djatoka MySQL 5.0 Tomcat Fedora Managed Storage External Storage Solr Index Resource Index Application ServerDatabase ServerFile Server
Content and code Fedora managed content Fedora managed content Fedora database Fedora database Fedora Resource Index Fedora Resource Index Solr Index Solr Index External content External content Application code Application code Can and should these items be shared across Fedora servers? Can and should these items be shared across Fedora servers? 4
Data Center Environment Two locations with two virtual servers each Two locations with two virtual servers each –Primary: NLM data center –Backup: Contingency operations data center –Active/Active – both locations always in use –Each virtual server has 3 CPU, 24 GB RAM System tools System tools –3DNS – wide load-balancing –BIG-IP – local load balancing –Server monitoring, automatic failover –SnapMirror – NetApp filesystem replication 5
System Architecture Primary Data CenterBackup Data Center BIG-IP Fedora Primary #1 Fedora DB External Storage Managed Storage Solr Index Resource Index Fedora Primary #2 Fedora DB Managed Storage Solr Index Resource Index BIG-IP Fedora Backup #1 Fedora DB External Storage Managed Storage Solr Index Resource Index Fedora Backup #2 Fedora DB Managed Storage Solr Index Resource Index Browser 3DNS
Ingest considerations Our Fedora system is read-only with controlled periodic batch content updates Our Fedora system is read-only with controlled periodic batch content updates System is available during updates – use one data center while updating the other System is available during updates – use one data center while updating the other Code and content should be identical across servers Code and content should be identical across servers Reduce time to ingest to all servers in system. Approx. 10 hours for full re-ingest. Reduce time to ingest to all servers in system. Approx. 10 hours for full re-ingest. 7
Content replication Content replication strategies Content replication strategies 1.Fedora journaling (ingest to master, master-slave, messaging) 2.Ingest to master, copy managed content to slave, rebuild slave DB and resource index from managed content (rebuild is faster than full ingest) 3.Ingest to master, use system tools (NetApp SnapMirror) to copy all resources to slaves. 4.Ingest to each server independently Our approach Our approach –Turn off primary data center, use backup data center to serve public –Ingest to primary 1, copy managed content to primary 2, rebuild primary 2... –Turn off backup data center, use primary data center to serve public –Use SnapMirror to copy all resources from primary 1,2 to backup 1,2 –Turn on backup data center, both data centers available to serve public 8
NLM Content Replication Primary Data CenterBackup Data Center Fedora Primary #1 Fedora DB External Storage Managed Storage Solr Index Resource Index Fedora Primary #2 Fedora DB Managed Storage Solr Index Resource Index Fedora Backup #1 Fedora DB External Storage Managed Storage Solr Index Resource Index Fedora Backup #2 Fedora DB Managed Storage Solr Index Resource Index Ingest Rebuild SnapMirror