Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.

Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown

The Problem  An exhaustive search of proteins against a known database  Each string is between 400 and 600 characters long  Comparing a 10,000 strings against 1,000,000 random strings would take 44 days with a 1Ghz processor

Why an exhaustive search?  Initial intent was to analyze proteins to determine 3-Dimensional structure  Exhaustive search is required to ensure that the match that found is the best match

Solution  Distribute the search among many PCs to obtain an answer faster.  Solution raises more problems, however

How to Distribute  Distributing search strings is not enough  Also distribute the search space  Must find efficient way to distribute 1GB of data without duplication

Program Details  Client/Server architecture  Uses proprietary protocol over TCP/IP to distribute data  Server uses SQL database to store a list of ‘jobs’ and ‘known’ sequences

Program Details  Server issues a single ‘job’ to each client upon request  Client may also request a batch of data for comparison.  Server marks which data has been sent to clients and avoids resending that data to new clients

The Server

The Client

Problems  Server is slow at updating its database.  This is only seen once for each client however.

Performance Analysis  1 client = 44 days (best algorithm)  2 clients = 26 days  3 clients = 19 days  Adding more than 1 client increases time almost linearly, though distribution is expensive

Graphs

Notes about the graphs  Graphs do not include initial distribution, since this is only done once per client  If search data distribution were to be included, efficiency would start at about 70% and increase to ~90% over time

Verifying Data Accuracy  Add an entry into the search table with a known score  Ensure that the result returned by the client is the known entry in the database

Lessons Learned  Reading and writing single elements to an SQL database can be very expensive.  Even the best designs aren’t perfect, especially when the problem is not fully understood.

Notes  Biggest problem was distribution of data  Distribution was very costly, so we tried to reuse data that we already distributed  Program is pluggable, so any comparison algorithm can be used

Question/Comments

Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.

Similar presentations

Presentation on theme: "Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.

Similar presentations

Presentation on theme: "Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown."— Presentation transcript:

Similar presentations

About project

Feedback