Download presentation
Presentation is loading. Please wait.
1
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Data Intensive Computing Enabling Seamless High Performance Computing David Lifka Cornell Theory Center Johannes Gehrke Cornell Computer Science 5/26/2004
2
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu The Cornell Puzzle Pieces ScientistsMiners Plumbers Tool Makers
3
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Data - A New Frontier in HPC Why is this important? Clusters are readily available Scientists are able to Consume and Produce more data than ever Lower, and if possible, Remove the HPC learning curve Support for seamless computing Allow scientist to focus on their science, not the “plumbing” Standard Open Interfaces – Web Services Interoperability Foundation for building custom interfaces Issues under investigation How do you construct pipelines from source to consumer? How do you install, configure and manage petabytes of data? How to ensure accessibility and reliability
4
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu CTC Data Intensive Computing Resources SQL Server Resources 16 SQL Servers5.046 TB Storage Compute Resources Cluster NodesProcsServer Model ProcessorMemoryDiskNet Velocity 164256Dell 6350PIII Xeon 500 MHz4 GB50 GB-RAID 0Giganet + 100T Velocity 1 Plus64128Dell 2450PIII 733 MHz2 GB50 GB-RAID 0Giganet + 100T CMI64128Dell 1550PIII 1 GHz2 GB50 GB-RAID 0Giganet + 100T Development816Dell 1550PIII 800 MHz2 GB50 GB-RAID 0Giganet + 100T Serial1818Dell 2450PIII 800 MHz1 GB50 GB-RAID 0100T Long1717Dell 2450PIII 600 MHz1 GB50 GB-RAID 0100T CBSU192384Dell 2650Xeon 2.4 GHz2 GB50 GB-RAID 01000T CBWeb64128Dell 1550PIII 1 GHz2 GB50 GB-RAID 0Giganet + 100T Velocity 2128256Dell 2650Xeon 2.4 GHz2 GB50 GB-RAID 01000T Manhattan1632Dell 2650Xeon 2.4 GHz2 GB50 GB-RAID 01000T ________________________________________________________________________________________________________________ Totals575 nodes1363 Procs1.363 TB memory28.750 TB Disk
5
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu How to Get Started Many of the biology laboratories have been using MySQL- CGI to write LIMS. Our experience with SQL Server - ASP.NET is that we can build a better interface in much shorter time. – Qi Sun
6
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Computational Materials Institute Tony Ingraffea, Keshav Pingali 10 -10 m 10 2 m 10 -5 m http://www.tc.cornell.edu/Research/CMI/Multiscale/index.asp
7
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Digital Materials Paul Dawson, Matt Miller http://anisotropy.mae.cornell.edu/downloads/dplab/means-posters.pdf
8
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Arecibo – The Search for Pulsars Jim Cordes, Johannes Gehrke, Jim Gray http://arecibo.tc.cornell.edu/arecibo/index.aspx http://www.cs.cornell.edu/johannes/
9
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Physically Accurate Imagery Steve Marschner http://www.cs.cornell.edu/~srm/
10
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Structure and Evolution of the Web William Arms, Daniel Huttenlocher, Jon Kleinberg http://www.cs.cornell.edu/wya/ http://www.cs.cornell.edu/~dph/ http://www.cs.cornell.edu/home/kleinber/
11
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Lab Information Management Systems for Bioinformatics Klaas van Wijk Computational Biology Service Unit Make genomic sequence data available to the research community The Plant Plastid Proteome Database is an example Developed together with Klaas van Wijk for the proteomics laboratory at Cornell Custom Web interface that allows users to input and search data Can grant different access level to different users. http://cbsu.cornell.edu Biological Samples Separate in 2-D Gel 31 Pick each spot and run mass spectrometry Data entry and database searching to identify proteins Data analysis and data publishing A typical proteomics experimental procedure
12
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu
13
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu
14
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Payoffs Seamless HPC Reduced or no learning curve Powerful Integrated Development Tools Easier Collaboration Standard Open Interfaces Interoperability Performance & Efficiency Code Reduction Scalability – Application scale even as your data grows Reliability Better program design
15
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu
16
David A. Lifka Chief Technical Officer Cornell Theory Center lifka@tc.cornell.edu Discussion - Building a Community Experience of the Scientists Developing your “20 Questions” What problem are you trying to solve? (Are you desperate for something better?) Experiences of Plumbers Working with Scientists How can plumbers help scientists Schema design issues Leveraging the latest tools Experiences of Miners of Large Data Sets Extracting Information from the Data Computing strategies I/O Strategies Developing Working Pipelines Life Cycles of Data Life times of Data Storage Issues Hierarchical Storage Mean time between failures Minimizing Impact of failures
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.