Bioinformatics Community of CNGrid A New Approach to Utilizing Grids Yongwei Wu Tsinghua University wuyw@tsinghua.edu.cn Participants Sponsorship Beijing Institute of Genomics, CAS Tsinghua University
Outline Background Our Approach Achievements Concluding Remarks
Exponential Growth of Bio Data We need more storage and processing power to store and analyze these data!
Grids Draw Much Attention Bioinformatics is an important application domain of grid computing around the world!
Problems with Existing Bioinformatics Grids Practice Professionals and sharing are not well balanced Resources are limited in the environments built under the leadership of domain scientists. The scale is also limited in the environments built only by the Bioinformatics researchers. For those environments built based on general infrastructure, they are usually not professional, hard to use No support for sharing GUI software Not highlighting data’s support for computation Data synchronization, backup and storage are beyond the ability of domain users, whereas IT developers know little about application requirements. Covering only partial research activities No support of daily communication, results sharing, …
Outline Background Our Approach Achievements Concluding Remarks
Key Points Domain scientists lead the bioinformatics community development Develops Nova to support GUI software sharing Nova is a toolkit for customizing app environment Highlights data support for computation Storage can be attached to the computing environment Introduces new functionalities Knowledge repository, data/software sharing, Q&A system
Nova: A Virtual Computing Toolkit Nova aims to provide facilities for users to utilize physical infrastructures in an easier and more productive way. Customized Host Customized Cluster Customized Services Nova
Nova Architecture & Work Procedure Master Node Worker Nodes Information Service Configuration ② Query ③ Create VM ① Request Worker Selection Data Storage ④ OS Image VM ⑤ Start VM ⑦ Notification ⑧ VNC Remote Desktop Data Storage KVM/XEN Hypervisor ⑥ App Image VM Monitor
Nova Features Install-/configuration-free client users only need a Web browser to use the system High productivity pre-virtualized software and one-click configuration Inherent integration with storage cloud After VMs are created, personal space in storage cloud can be automatically attached as an independent driver, which then acts as a source for input data and as the destination of produced data
GUI Software Sharing by Nova User Request Nova Core Services Worker Selection Image Loading …
Providing Workflow Support To improve research efficiency further Both workflow definition tool and workflow services are supplied.
Knowledge Repository Hot Research Topics Important Journals Important Conferences/Workshops Famous Scholars Important Research Institutes Influential Surveys/Papers Important Organizations/Associations
Other Useful Resources Q & A System For users to help each other System Announcement Seminar Research Breakthrough Conference/Workshop CFP Newly added functions
Outline Background Our Approach Achievements Concluding Remarks
The Community is on Service 16
Sequence Format Conversion (17) 234 tools are integrated! DNA Analysis (28) (SequenceViewer, WinGene, etc.) RNA Analysis (19) (miRanda, RNAshapes, etc.) Protein Analysis (17) (InterproScan, InterViewer, etc.) Protein Structure (38) (Protein Explorer, RasTop, etc.) 234 Evolution Analysis (41) (MEGA, GeneTree, etc.) Sequence Assemble and Alignment(74) (BLAST, ClustalX, BioEdit, etc.) Sequence Format Conversion (17) (SeqVerter, DataConvert, etc.)
47 databases are provided! UCSC Genome full mirror + following
Community Usage More than 100 users now More than 60 institutes are involved. Users’ preference to the resources provided Software tools 77.38% Database 67.86% Knowledge Repository and others 32.14% Software Tool Database Knowledge Repository
Scenario for HnNn Analysis 20
Outline Background Our Approach Achievements Concluding Remarks
Concluding Remarks User- and task-oriented design is important The key to driving cloud computing successful The reason why we choose domain scientist-leading design Value-added services are important The key to attracting and retaining users The reason why we provide workflow support Challenges still ahead How to survive the data deluge? How to support new requirements?
Thanks!