1 Joint meeting of the Molecular Libraries Screening Centers Network (MLSCN) and the Exploratory Centers for Cheminformatics Research (ECCR): Talk I July.

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Indiana University Chemical Informatics Programs Gary Wiggins
From Chemical Information to Cheminformatics: Graduate Programs at Indiana University Gary Wiggins School of Informatics May 21, 2007.
Indiana University School of David Wild – CICC Quarterly Meeting, Jan Page 1 Projects 1-4 update David Wild CICC Quarterly Meeting January 27.
Educational Opportunities in Cheminformatics at IU Gary Wiggins
VARUNA – Towards a Grid- based Molecular Modeling Environment CICC/MACE – Meeting May 22, 2006 Mookie Baik Department of Chemistry & School of Informatics.
Educational Activities in Cheminformatics at IU Gary Wiggins
1 Overview of Chemical Informatics and Cyberinfrastructure Collaboratory Aug Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology.
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
Building a Chemical Informatics Grid Marlon Pierce Community Grids Laboratory Indiana University.
CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory.
Educational Activities in Cheminformatics at IU Gary Wiggins
28 October 2005Jeremy Frey, University of Southampton1 “The CombeChem Experience” CICC Workshop 28 October 2005 Bloomington Indiana.
Community Grids Lab CICC Activities Geoffrey Fox, Marlon Pierce Indiana University.
Chemical Informatics and Cyber- infrastructure Building Blocks Chemical Informatics Resources:  Deluge of experimental data > 100,000 compounds screened.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Student Visits August Geoffrey Fox
Simo Niskala Teemu Pasanen
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
1 Gary Wiggins for Geoffrey Fox April 30, 2007 Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana.
Possible Architectural Principles for OGSA-UK and other Grids UK e-Science Core Programme Town Meeting London Monday 31st January 2005 “Defining the next.
Research & Academic Computing Bradley C. Wheeler Associate Vice President & Dean.
18:15:32Service Oriented Cyberinfrastructure Lab, Grid Deployments Saul Rioja Link to presentation on wiki.
1 Grids/CI for Scholarly Research and application to Chemical Informatics HPC 2006 in Cetraro – Italy July Geoffrey Fox Computer Science, Informatics,
Experimenting with FutureGrid CloudCom 2010 Conference Indianapolis December Geoffrey Fox
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
PolarGrid Geoffrey Fox (PI) Indiana University Associate Dean for Graduate Studies and Research, School of Informatics and Computing, Indiana University.
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
Computational Science and the School of Informatics at Indiana University IU/HBCU STEM Initiative IUPUI April Geoffrey Fox Computer Science, Informatics,
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
1 Joint meeting of the Molecular Libraries Screening Centers Network (MLSCN) and the Exploratory Centers for Cheminformatics Research (ECCR): Talk II July.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
1 Grids and Web 2.0 supporting eScience STEM Scholars Seminar Indiana University Memorial Union August Geoffrey Fox Computer Science, Informatics,
1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
Indiana University School of David Wild, Geoffrey Fox, Bioinformatics retreat, February Page 1 Chemoinformatics David Wild, Bioinformatics.
ECCR Overview/MLSCN. NIH Roadmap Series of initiatives designed to pursue major opportunities in biomedical research and gaps in current knowledge that.
1 Web 2.0 and Grids for Scholarly Research Peking University July Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
1 Overview of Chemical Informatics and Cyberinfrastructure Collaboratory October Geoffrey Fox Computer Science, Informatics, Physics Pervasive.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Event-Based Model for Reconciling Digital Entities Ahmet Fatih Mustacoglu Ahmet E. Topcu Aurel Cami Geoffrey C. Fox Indiana University Computer Science.
SAN DIEGO SUPERCOMPUTER CENTER Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways SDSC Director of Consulting,
Chemical Informatics and Cyberinfrastructure Collaboratory An NIH-Funded Exploratory Center for Cheminformatics Research Project of the IU School of Informatics.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Joint Techs, Columbus, OH
Chemical Informatics and Cyberinfrastructure Collaboratory
Gary Wiggins for Geoffrey Fox
Recap: introduction to e-science
CICC Combines Grid Computing with Chemical Informatics
The two faces of Cyberinfrastructure: Grids (or Web 2
Cyberinfrastructure and PolarGrid
CICC Chemical Compound Mining Workflows
Chemical Informatics and Cyberinfrastructure Collaboratory
Presentation transcript:

1 Joint meeting of the Molecular Libraries Screening Centers Network (MLSCN) and the Exploratory Centers for Cheminformatics Research (ECCR): Talk I July Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN With apologies for my credentials. I have written a few papers on Biology, Chemistry and Crystallography while at Cambridge, Caltech and Syracuse Mostly on applications of parallel computing

2 Start-up and Organization Local Teams, successful Prototypes and International Collaboration set up in 3 major focus areas “Tool and Data” Cyberinfrastructure “Archival Database and Simulation” Cyberinfrastructure Education Wiki chosen to support project as a shared editable web space Web site Building Collaboratory involving PubChem – Global Information System accessible anywhere and at any time – enhance PubChem with distributed tools (clustering, simulation, annotation etc.) and data Initial results discussed at conferences/workshops/papers Gordon Conferences, ACS, SDSC tutorial First new Cheminformatics courses offered Advisory board set up and met Videoconferencing-based meetings with Peter Murray-Rust and group at Cambridge roughly every 2-3 weeks Good interactions with NIH DTP, Lilly and Michigan ECCR

3

4 CICC Senior Personnel Geoffrey C. Fox Mu-Hyun (Mookie) Baik Dennis B. Gannon Marlon Pierce Beth A. Plale Gary D. Wiggins David J. Wild Yuqing (Melanie) Wu Peter T. Cherbas Mehmet M. Dalkilic Charles H. Davis A. Keith Dunker Kelsey M. Forsythe Kevin E. Gilbert John C. Huffman Malika Mahoui Daniel J. Mindiola Santiago D. Schnell William Scott Craig A. Stewart David R. Williams From Biology, Chemistry, Computer Science, Informatics at IU Bloomington and IUPUI (Indianapolis)

5 CICC Advisory Board Alan D. Palkowitz (Eli Lilly) Andrew Martin (Kalypsys) David Spellmeyer (IBM) Dimitris K. Agrafiotis (Johnson & Johnson) Horst Hemmerle (Eli Lilly) James M. Caruthers (Purdue University) Jeremy G. Frey (University of Southampton) Joel Saltz (Ohio State University/University of Maryland/Johns Hopkins University) John M. Barnard (Digital Chemistry) John Reynders (Eli Lilly) Peter Murray-Rust (University of Cambridge) Peter Willett (University of Sheffield) Thompson Doman (Eli Lilly) Val Gillet (University of Sheffield) Industry and Academia Met October 2005 will meet this fall

6 Publications Baik says he is especially productive due to Cyberinfrastructure

7 Our Meetings are on the Web

8 Varuna environment for molecular modeling (Baik, IU) QM Database Researcher Simulation Service FORTRAN Code, Scripts Chemical Concepts Experiments QM/MM Database PubChem, PDB, NCI, etc. ChemBioGrid Reaction DB DB Service Queries, Clustering, Curation, etc. Papers etc. Condor TeraGrid Supercomputers “Flocks”

9 Cyberinfrastructure and Grids These support eScience or distributed Computers, Databases, Instruments, Sensors and People Grids use large scale managed Web services – the current major technology building on modern Industry enterprise and Internet systems W3C, OASIS, OGF or Open Grid Forum (Fox VP for eScience) develops standards insuring distributed resources interoperate Cheminformatics benefits from 2 styles of Grids TeraGrid typifies Grid support of large scale computation of parallel simulations Bioinformatics (BIRN, caBIG, MyGrid …), Earth Science and Astronomy Grids illustrate integration of real-time and archival data(bases) and computation Well designed Grids run faster than older approaches

10 Cheminformatics Grids Need Broad System standards such as WSDL, SOAP, WSRM, JSDL, BPEL Domain specific data structures CML Cheminformatics GML Earth Science CellML, SBML Biology VOQL Astronomy Use of specific Grid/Web service technologies such as Web services directly for tools Web service proxies for large simulation codes – ANYTHING can be made a Web service efficiently if execution/network access time ≥ 20ms Portals/Portlets for user interfaces Workflow for composition Access to data and compute resources

TeraGrid: Integrating NSF Cyberinfrastructure TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. SDSC TACC UC/ANL NCSA ORNL PU IU PSCNCAR Caltech USC-ISI Utah Iowa Cornell Buffalo UNC-RENCI Wisc

12 Top500 Supercomputers in the world Indiana University has Highest Performance U.S. Academic Computer System 20 Teraflops peak

13 Products and Demonstrations www. chembiogrid. org Note mixture of In-house Out of House Commercial Academic

CICC Prototype Web Services Molecular weights Molecular formulae Tanimoto similarity 2D Structure diagrams Molecular descriptors 3D structures InChi generation/search CMLRSS Basic cheminformatics Application based services Compare (NIH) Toxicity predictions (ToxTree) Literature extraction (OSCAR3) Clustering (BCI Toolkit) Docking, filtering,... (OpenEye) Varuna simulation Define WSDL interfaces to enable global production of compatible Web services; refine CML Ready to try “Prototype Production” Develop more training material Refine/go into production with key services including both tools, workflows and TeraGrid style simulations in capacity and capability modes In-house algorithm work for new services in clustering, diversity analysis, QSAR methodologies Next steps? Key Ideas Add value to PubChem with additional distributed services and databases Wrapping existing code in web services is not difficult Provide “core” (CDK) services and exemplars of typical tools Provide access to key databases via a web service interface Provide access to major Compute Grids

Web Service Locations Indiana University Clustering VOTables OSCAR3 Toxicity classification Database services Penn State University CDK based services Fingerprints Similarity calculations 2D structure diagrams Molecular descriptors Cambridge University InChi generation / search CMLRSS OpenBabel InfoChem SPRESI database SDSC Typical TeraGrid Site NIH PubChem ….. Compare …..

Usage of Open Source Projects A number of open source projects are used in our infrastructure  CDK provides the underlying cheminformatics toolkit  R provides the back-end modeling capabilities  OSCAR is used for literature mining  ToxTree is used to provide toxicity classification  Open data and standards as promoted by the Blue Obelisk project

Contributions to Open Source Projects We also contribute functionality to these projects  Molecular descriptor development to the CDK  Modifications of various CDK functionality to make them suitable for web service usage  Infrastructure for accessing R from the CDK  Packages to use the CDK from within R  Quality control, testing and documentation Steinbeck, C. et al.; Curr. Pharm. Des., 2006, 12(17), Guha, R.; CDK News, 2005, 2(1), 7-13

Workflows Using Chemical Literature OSCAR3 program All of PubMed “just” takes about a day to run through OSCAR3 on 2048 node Big Red SMILES NAME Pubmed ID CCC propane CC ethane Bulk download of Pubmed abstracts Extract chemical structures OSCAR3 Service Find similar molecules Searchable (structure/similarity) Grid database Local DTP database PubChem PDBBind Find similar document s Clustering of documents linked to clustering of chemicals

19 Existing User Interface Document-enhanced Cyberinfrastructure etc. Google Scholar Manuscript Central Science.gov Windows Live Academic Search Citeseer CMT Conference Management Existing Document-based Research Tools Web service Wrappers New Document-enhanced Research Tools including Web2.0, Mashups, Annotation Integration/ Enhancement User Interface Community Tools Generic Document Tools MyResearch Database Bibliographic Database Export: RSS, Bibtex Endnote etc. CiteULike Connotea Del.icio.us Bibsonomy Biolicious PubChem PubMed Traditional Cyberinfrastructure

20 Products and Demonstrations II

Indiana University School of David Wild – Research Overview July Page 21 Example HTS workflow: organization & flagging A biological screen is selected. The activity results for all the compounds is extracted from the database (currently using DTP Tumor Cell Line database) The compounds are clustered on chemical structure similarity, to group similar compounds together The compounds along with property and cluster information are converted to VOTABLES format and displayed in VOPLOT OpenEye FILTER is used to calculate biological and chemical properties of the compounds that are related to their potential effectiveness as drugs Taverna Workflow

22 Load Workflow Run Workflow Current Process Result Output Result Output URL

23 Lilly very interested in our new educational programs

24 Total Grad Enrollment: Chem-, Lab, Bio-, Health Informatics, Fall 2005 Red = Expected, Chem, Fall 2006 MSChemLabBioHealth IUB3/30380 IUPUI6/ TOTAL9/ PhDChemLabBioHealth IUB1/3030 IUPUI0/1043 TOTAL1/4073

25 Formal Cheminformatics Courses I571 Chemical Information Technology (3 cr.) –Distance Ed section had 10 students in Fall 2005, from California to Connecticut I572 Computational Chemistry and Molecular Modeling (3 cr.) I573 Programming Techniques for Chemical and Life Science Informatics (3 cr.) I553 Independent Study in Chemical Informatics (3 cr.) Above courses required for the new Graduate Certificate Program in Chemical Informatics Also I533 (Cheminformatics seminar)

26 More detailed Slides not used

27 TeraGrid Hardware and Software TeraGrid is coordinated at the University of Chicago and includes 8 partner facilities NCSA, SDSC, PSC, ORNL, IU, PU, TACC, UC/ANL TeraGrid hardware totals > 102 teraflops of computing power. Comprehensive information available from Systems are primarily Linux clusters. Grid software and services (Globus, MyProxy, etc) provide a uniform means for accessing TeraGrid resources. Scheduling, running and monitoring jobs Monitoring resources Moving and managing remote files. Common service APIs simplify the process for building remote tools.

28 Prototype CICC Project: Controlling the TGF  pathway Collaboration between Baik & Zhang at IU PDB 1IAS Inactive TGF  VARUNA Experiments in the Zhang Lab Active TGF With inhibitor PubChem in-house Molecules in Varuna Conceptual Understanding of TGF  Inhibition Simulations AutoGeFF Questions: - What molecular feature controls inhibitor binding? - How do mutations impact binding? Web Service to generate custom force fields

29 MLSCN Data - How services and workflows are used MLSCN submits HTS data to Pubchem and/or sends directly to workflow for real-time feedback Data is stored in Pubchem Workflows perform different kinds of analysis on the MLSCN data - the variety of workflows is limitless End-user applications and interfaces utilize the information streams from the workflows for human interaction with the data and analysis PubChem interfaces to workflows via SOAP