OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research.

Slides:



Advertisements
Similar presentations
Meet Hadoop Doug Cutting & Eric Baldeschwieler Yahoo!
Advertisements

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any.
Fill in missing numbers or operations
Name: Date: Read temperatures on a thermometer Independent / Some adult support / A lot of adult support
Zhongxing Telecom Pakistan (Pvt.) Ltd
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Media6. Who We Are Media6° is an Online Advertising Company Specializing in Social Graph Targeting –Birds of a feather flock together! –We build.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
1,000 Lines of Code T. Hickey Code4Lib Conference 2006 February.
Z39.50 as a Web Service Ralph LeVan Research Scientist.
OCLC Online Computer Library Center SRW & OAI Ralph LeVan OCLC Research.
OCLC Online Computer Library Center Jennifer Pearson OCLC Services for Groups OCLC Services for Groups Solutions for your Consortia.
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
CALENDAR.
Number Theory Click to begin. Click here for Final Jeopardy.
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Who Wants To Be A Millionaire? Addition & Subtraction Word Problems.
The Impact of Soft Resource Allocation on n-tier Application Scalability Qingyang Wang, Simon Malkowski, Yasuhiko Kanemasa, Deepal Jayasinghe, Pengcheng.
The 5S numbers game..
By: Walter C. Brown and Daniel P. Dorfmueller
Auto-scaling Axis2 Web Services on Amazon EC2 By Afkham Azeez.
Extreme Performance with Oracle Data Warehousing
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Analysis grid superimposed 2D Street Grid Calculating Travel-Time …vector to raster conversion Note that a 100 row by 100 column analysis grid (10,000.
Everything about Integers
Factoring Quadratics — ax² + bx + c Topic
Fast Crash Recovery in RAMCloud
Searching over Many Sites in Jeremy Stribling Joint work with: Jinyang Li, M. Frans Kaashoek, Robert Morris MIT Computer Science and Artificial Intelligence.
Database Performance Tuning and Query Optimization
PP Test Review Sections 6-1 to 6-6
Sparse Matrices sparse … many elements are zero dense … few elements are zero.
Microsoft Confidential. We look at the world... with our own eyes...
Page Replacement Algorithms
Chris Morgan, MATH G160 January 8, 2012 Lecture 13
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Geometry Part 1B Perimeter By Julia Arnold, Dick Gill and Marcia Tharp for Elementary Algebra Math 03 online.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Item Class User Friendly Maintenance  Copyright.
B-tree. Why B-Trees When the data is too big, we will have to use disk storage instead of putting all the data in main memory In such case, we have to.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Introduction to Indexes Rui Zhang The University of Melbourne Aug 2006.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
LO: Count up to 100 objects by grouping them and counting in 5s 10s and 2s. Mrs Criddle: Westfield Middle School.
Strategy Review Meeting Strategy Review Meeting
Before Between After.
 Find the difference between the two numbers on the red boxes.  If the difference of the red boxes matches the blue box say “deal” f not, it’s “no.
Subtraction: Adding UP
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Middle School Lesson 2 Activity 3 – The Guessing Game
Converting a Fraction to %
CSE Lecture 17 – Balanced trees
& dding ubtracting ractions.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Dataflow Programming with MaxCompiler.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
OCLC Online Computer Library Center Annual Report: New Enterprises & Development News Marty Withrow, Director Product Development Division oclc.org.
Digital Volcanoes and Data Flows Carol Hamilton 1VALA 2012.
Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.
Meta-Server System Software Lab. Overview In the Music Virtual Channel system, clients can’t query for a song initiatively Through the metadata server,
Computational Research in the Battelle Center for Mathmatical medicine.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Shared Nothing Architecture Allen Archer. What is Shared Nothing architecture? It is a distributed architecture in which each node is independent and.
Data Storage Requirements
How the VIAF Magic Happens
Election #1 Popular Vote Electoral Vote State Red Yellow
Presentation transcript:

OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

Goal Demonstrate 100 searches/second on our 50 million record WorldCat database residing on a small Beowulf Cluster

Beowulf Cluster 24 nodes –2 2.8GHtz Xeon CPUs –4 GB of memory 80 GB of disk on 23 application nodes 130 GB of disk on root node

Database 50 million records 69 partitions (~700,000 records) –3 partitions per application node Partitioned by popularity Searched using OCLC Researchs Open Source Gwen and Pears toolkits

Architecture 1 Tomcat on each application node 3 SRW/U databases configured for each Tomcat 1 client application on the root node

Trial #1 SRW client searching 69 databases Result: 2 searches/second (437ms/search) Ganglia Cluster Report shows the root node glowing red and the application nodes a peaceful blue

Trial #2 SRU client with scanned response searching 69 databases Result: 25 searches/second (40ms/search) Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue

Trial #3 SRW client with hand built XML and scanned response searching 69 databases Result: 21 searches/second (46ms/search) Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue SRW dropped

Rearchitecture Problem: Ganglia Reports indicate that the client is the bottleneck Solution: Put a 3-way federator on each Tomcat (a virtual database for the client) and have the client search 23 databases instead of 69

Result SRU client: 71 searches/second (14 ms) Hand-built SRW client: 33 searches/second (30ms) Original SRW client: 6 searches/second(164) Ganglia cluster report still shows root node red, but application nodes are now green and yellow

Rearchitecture Create a virtual 23-way database on each Tomcat that will federate searches from the 23 virtual 3-way databases Put one of these on each Tomcat Create a new client that sends searches on threads to each available 23-way database

Result With 23 threads, 172 searches/second –Average response time of 170ms The Ganglia report showed all nodes running red