OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research
Goal Demonstrate 100 searches/second on our 50 million record WorldCat database residing on a small Beowulf Cluster
Beowulf Cluster 24 nodes –2 2.8GHtz Xeon CPUs –4 GB of memory 80 GB of disk on 23 application nodes 130 GB of disk on root node
Database 50 million records 69 partitions (~700,000 records) –3 partitions per application node Partitioned by popularity Searched using OCLC Researchs Open Source Gwen and Pears toolkits
Architecture 1 Tomcat on each application node 3 SRW/U databases configured for each Tomcat 1 client application on the root node
Trial #1 SRW client searching 69 databases Result: 2 searches/second (437ms/search) Ganglia Cluster Report shows the root node glowing red and the application nodes a peaceful blue
Trial #2 SRU client with scanned response searching 69 databases Result: 25 searches/second (40ms/search) Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue
Trial #3 SRW client with hand built XML and scanned response searching 69 databases Result: 21 searches/second (46ms/search) Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue SRW dropped
Rearchitecture Problem: Ganglia Reports indicate that the client is the bottleneck Solution: Put a 3-way federator on each Tomcat (a virtual database for the client) and have the client search 23 databases instead of 69
Result SRU client: 71 searches/second (14 ms) Hand-built SRW client: 33 searches/second (30ms) Original SRW client: 6 searches/second(164) Ganglia cluster report still shows root node red, but application nodes are now green and yellow
Rearchitecture Create a virtual 23-way database on each Tomcat that will federate searches from the 23 virtual 3-way databases Put one of these on each Tomcat Create a new client that sends searches on threads to each available 23-way database
Result With 23 threads, 172 searches/second –Average response time of 170ms The Ganglia report showed all nodes running red