Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics at USDA-ARS Livestock Issues Research Unit Scot E. Dowd, Joaquin Zaragoza Mel Oliver and Paxton Payton.

Similar presentations


Presentation on theme: "Bioinformatics at USDA-ARS Livestock Issues Research Unit Scot E. Dowd, Joaquin Zaragoza Mel Oliver and Paxton Payton."— Presentation transcript:

1 Bioinformatics at USDA-ARS Livestock Issues Research Unit Scot E. Dowd, Joaquin Zaragoza Mel Oliver and Paxton Payton

2 Projects Future: Interactive neural network based models to describe and predict gene expression in Livestock and Pathogens Present: Various Projects Various States Leading to the Future –Molecular Modeling –Gene Finding –Distributed BLAST –Whole Genome Comparison –Functional Genomics and pathways –Pathway or system targeted Microarray design

3 Functional Genomics Functional Genomics/Gene Ontology- controlled vocabulary Define, annotate, categorize, and describe large genetic datasets (e.g. est, mRNA) We have developed a custom curated database for functional domain BLAST (regular blast and rps-BLAST using kog, cog, pfam, hmmr, smart domains) Ultimately will become a comprehensive.NET suite of analyses for microarray design from new sequence all the way to result visualization.

4 Ontology Annotation – propogation of error in definitions Ca

5 BLAST: need for speed (II) We are working with roughly 5000-100,000 queries against 1GB databases 1 query takes a fairly fast PC 3 minute to complete –dual 3.2 GHZ XEON –6 GB RAM –RAID0 SCSI-320 HD Other methods MPI-BLAST, WU-BLAST, THREADED BLAST, SGE-BLAST, commercial TURBO BLAST, DNAstar etc.

6 BLAST ALGORITHM Cgtcgctcgctgtaagtac– query e.g. 1000 letter word Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) A basic local alignment search tool. Journal of Molecular Biology 215, 403- 410. What database sequence is most similar to my query. Databases one of ours is 60GB worth of letters BLAST generates statistics based upon similarity and substitution probabilities In simplest form purine to purine better than purine to pyrimidine Slide along 4 GB database find word match and try to extend

7 BLASTX as example-Translation into 6 reading frames, search database with these 6 sequences with word size of 3. Time to BLAST –Up to a point decreased time correlated with number of slaves available –Average test machines (2.4 ghz/1gb RAM/SATA150) –(e.g. 90 seq/13 CPU/3 min) vs (90seq/1CPU/38.5 min) 350MB db GB-LAN

8

9 .NET Distributed BLAST Take advantage of unused laboratory compute resources Provide easy, powerful tool for Distributing BLAST Target Atmosphere –Windows LAN Current Open Source Distributed BLAST Applications –Require server class master or version of UNIX –Difficult to set up, configure databases, compile and submit jobs. –No large job fault tolerance

10 W.ND BLAST : A Bioinformatician promoting windows?.NET C# First tests Condor, MPI, a ported remote shell Contractor Project Manager Database formatter Worker machines Job leasing Output processing HT backend apps

11 Gotta GUI

12 Database formatter

13

14

15

16 Functionality Network bandwidth would eventually be limited Fault tolerant to worker failure Resume upon reboot if Contractor fails No statistical problems with search results Complete BLAST database on each worker node if resources allow Easy to install a breeze to use

17 .NET Distributed BLAST Queue at each node –Contractor only allows maximum of two query sequences in each node’s queue –Ensures application wait a minimal amount of time between completion and next job Thread per node –Makes use of.NET Asynchronous Delegate / AD – scalability ??? –Thread Invokes BLAST on remote node –Upon completion, remote node sends “finished” message to the Contractor –The contractor collects results and performs validity check –Once results are verified, remote worker BLAST starts on queue sequence and Contractor prepares allocates future job

18 .NET Distributed BLAST Fault Tolerance-revisited –Task migration handled through application-level checkpointing –Worker encounters fault or crashes, –Contractor redirects failed nodes sequence on another worker node. –Minimal loss of time Integrating QOS functionality- current in works –decrease priority when workstation is in use –based upon system remote call checking CPU%, memory etc –GUI allows increasing or decreasing priority – rev gauges and throttles –Storage requirement limitations - redirect query to other database source (working with 10 connection limitation in XP pro)

19 Future Directions Quality of Service –Allow Contractor to set priority for application Contractor Fault Tolerance Large Network Optimization –Sub Contractors Asynch Del. Thread limit- ewww kewl WEB SERVICE! Shadow (Sub) Contractors- network load balance

20 The End! Questions? Suggestions? Advice? Even Criticism?


Download ppt "Bioinformatics at USDA-ARS Livestock Issues Research Unit Scot E. Dowd, Joaquin Zaragoza Mel Oliver and Paxton Payton."

Similar presentations


Ads by Google