Download presentation
Presentation is loading. Please wait.
1
Advanced CPAS Adam Rauch LabKey Software adam@labkey.com
2
Agenda Demo of recent & advanced features Pipeline architecture & configuration Production installations
3
What Is CPAS? A proteomics analysis system that handles all data processing & management for high-throughput labs and core facilities
4
Demo
5
“Mini-Pipeline” Included & configured in standard install CPAS invokes executables (tandem, tpp) directly on web server Simple approach works fine for low- throughput evaluation installs
6
CPASPipeline FHCRC Installation NetApp Web Server 2 Proc, 2GB Tomcat Database Server 4 Proc, 4GB MS SQL Server File Server (Sun Hierarchical Storage) Tape Robot Cluster mzXML Conversion Server Mass Spec PC Pipeline Mgr 20+ TB
7
Production Pipeline Multi-server, clustered, high-throughput pipeline demands a more sophisticated approach CPAS interface for configuring, submitting jobs is identical, but pipeline control & communication is handled differently Each project typically configured with separate “pipeline root” User initiates search by selecting raw file and specifying search parameters (protocol) CPAS writes settings file to raw-file directory Background process (chron job) running on pipeline server sees new job and kicks off pipeline processing
8
CPAS Pipeline Automated pipeline moves MS2 data from instrument, through MS/MS search and post-processing, and into CPAS MS/MS Search Cluster PC #40 X! Tandem, SEQUEST, MASCOT XPRESS, Peptide/Protein Prophet Sample Input Raw File mzXML, pepXML, protXML Files CPAS Convert Server mzXML File LCQ MALDI Sample Input Raw File LTQ FT Sample Input Raw File
9
Production Pipeline Workflow Chron job state machine manages workflow – Initiates RAW mzXML conversion Conversion server (ConversionQueue) Vendor-specific DLLs require Windows server – Submits MS/MS search to cluster scheduler – Submits post-processing jobs to cluster scheduler – Handles fractionation scenarios (individual, multi) – When processing is complete, instructs CPAS to load run Job status is reported via log files, which CPAS reads to update web UI
10
Search Engine Configuration SEQUEST cluster uses “SequestQueue” – Custom Tomcat/Java web application – Installed on head node of cluster – Pipeline communicates with SequestQueue over HTTP Pipeline drives Mascot cluster directly via HTTP Pipeline drives X! Tandem via cluster scheduler
11
Configuring A Production Pipeline Install, customize Perl scripts that manage the workflow – Scripts used at Fred Hutchinson are available as an example Configure conversion server – Converters & vendor-specific DLLs Install TPP, MS/MS search engine(s) on cluster Enable your search engine(s) within CPAS Install CPAS FTP server (optional) – Useful to allow external collaborators to submit jobs to pipeline Configure pipeline email notifications (optional) – Email notifications for completion and/or failures
12
Demo
13
Production Installation
14
Web & Database Servers Server operating system(s) – CPAS runs on all popular operating system platforms – Solaris, Linux, Windows, OS X installations – Windows has somewhat easier install & upgrade process Graphical installer Pre-compiled binaries – Select OS that you & your IT staff are most comfortable with Database server – PostgreSQL: runs on all popular hardware/OS platforms, free – Microsoft SQL Server: Windows only, commercial, well tested Server hardware – Invest in database server: powerful server, ample storage, reliability – Web server much less demanding
15
IT Infrastructure Shared file system (NFS) – CPAS and pipeline need to access to a common NFS – Archive RAW, mzXML, pepXML, etc. files Need plan for backing up NFS and database
16
Select Administrators Database administrator Server administrators CPAS site administrators CPAS project administrators
17
Production Installation Customization & Settings Many settings for customizing CPAS to your needs – Fully documented on www.labkey.org – Review all settings carefully on a regular basis CPAS settings are handled in several places – Most configuration is done via the “Admin Console” – /conf/server.xml – /conf/Catalina/localhost/labkey.xml
18
Database JDBC parameters specified in labkey.xml – Driver class (PostgreSQL vs. SQL Server) – URL includes server name, port, database name – User name & password Protected your data – CPAS database user needs read/write/delete/update perms – Use a strong password! – Provide no access to database server outside firewall PGTest and jtdstest tools can help test config
19
Networking Basic Networking – Specify port in server.xml – Open firewall port(s) – Procure server name and update DNS SMTP settings – Server, port, credentials specified in labkey.xml – System email address specified in site settings
20
Security Designed to keep sensitive, unpublished scientific data secure Authentication: dual scheme approach – Can delegate to institution’s LDAP system – External users: invitation only Users choose their own passwords Hash of password is stored in database and used for authentication Authorization: Users must be granted explicit permissions – All data stored in folder hierarchy managed by the database – Users are added to groups – Groups are granted permission to folder or hierarchy – Authorized only if user belongs to group with required permissions Folders can be made “public” (no authentication required)
21
Security Settings SSL – We strongly recommend requiring SSL connections – Enable SSL port in server.xml – Use “Require SSL connections” option & port setting LDAP & SASL – Configure CPAS to authenticate users to your organization’s LDAP server(s) – Specify server name, domain, principal template, SASL Email templates – Customize new user registration, password change, etc. emails
22
Other Settings Network drive – Allows CPAS running as Windows service to attach NFS as a drive Site-wide option to enable caBIG TM Mascot & SEQUEST connection settings Site description, color theme, font size, logo
23
Future Directions Web services-based pipeline Faster, easier loading of protein annotations Multi-engine comparisons Improved generalized query support Phase 2 of caBIG support
24
LabKey Software, Inc. Private consulting company created by FHCRC and team of software professionals – Formed to support, document, and extend the CPAS project to other functions and labs – Independent company to directly address other institutions’ needs and secure outside funding Partnership: – Clients provide scientific leadership – LabKey focuses on software development LabKey is available to customize, install, and support your pipeline, CPAS, and other LabKey applications – Business model ensures you get help & support when you need it
25
Next Steps Visit our booth Join our informal receptions here – 6:30 – 9:30PM Tonight & Tomorrow Talk to LabKey about your plans
26
Resources http://www.labkey.org – CPAS Distribution & Support Site – Ask questions, contribute feedback – Peruse all the CPAS documentation & tutorials – Download the latest version (LabKey 2.1) Graphical installer for Windows installation Well documented “manual” installation for Linux/Mac http://www.labkey.com – LabKey Software Inc. company web site CPAS Paper – Rauch A, Bellew M, Eng J, et al. Computational Proteomics Analysis System (CPAS): An Extensible, Open-source Analytic System for Evaluating and Publishing Proteomic Data and High throughput Biological Experiments. J Proteome Res 2006;5(1):112-121.
27
Acknowledgements Fred Hutchinson Cancer Research Center National Cancer Institute Canary Foundation Gates Foundation Institute for Systems Biology Ron Beavis & The GPM Numerous developer contributors
28
Questions?
30
Advanced Analysis Features Filter groups of runs and compare peptides, proteins, ProteinProphet, quantitation, etc Analyze groups of runs based on sample properties Search all experiments for a specific protein or gene name Link results to protein annotations – Load protein knowledgebases: TrEMBL, Swiss-Prot – Gene Ontology: produce GO charts analyzing molecular function, cellular location, metabolic process – Custom protein annotation lists Flexible, custom query capability – Join results to protein, experiment, sample tables – Display exactly the data you care about
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.