Download presentation
Presentation is loading. Please wait.
Published byMilton Lyons Modified over 8 years ago
1
Enabling Grids for E-sciencE High Performance Distributed Computing Sophie Lemaitre Monterey - California July 2007
2
Enabling Grids for E-sciencE Database Streams
3
Enabling Grids for E-sciencE First Keynote One of the most interesting talks –Database streams http://www.cs.berkeley.edu/~franklin/Talks/HPDC07.ppt http://www.cs.berkeley.edu/~franklin/Talks/HPDC07.ppt
4
Enabling Grids for E-sciencE Upside Down Approach Static Batch Reports Bulk Load Data QueriesResults Batch ETL & load, query later Poor RT monitoring, no replay DB size affects query response Traditional Database Approach Data Warehouse Always-on data analysis & alerts RT Monitor & Replay to optimize Consistent sub-second response Data Stream Processing Approach Continuous, Visibility, Alerts Live Data Streams Results Data Stream Processor
5
Enabling Grids for E-sciencE The “Jellybean” Argument Reality: With stream query processing, real-time is cheaper than batch. –minimize copies & query start- up overhead –takes load off expensive back- end systems –rapid application dev & maintenance Conventional Wisdom: “can I afford real-time?” Do the benefits justify the cost?
6
Enabling Grids for E-sciencE Example 2 - Stream/Table Join SELECT T.symbol, AVG(T.price*T.volume) FROM Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 S WHERE T.symbol = S.symbol AND T.volume > 5000 GROUP BY T.symbol Every 3 seconds, compute avg transaction value of high-volume trades on S&P 500 stocks, over a 5 second “sliding window” Stream Table Window clause Note: Output is also a Stream
7
Enabling Grids for E-sciencE Stream Processing + Grid? On-the-fly stream processing required for high-volume data/event generators. Real-time event detection for coordination of distributed observations. Wide-area sensing in environmental macroscopes.
8
Enabling Grids for E-sciencE Industry session
9
Enabling Grids for E-sciencE Industry session Most interesting session –eBay Same talk than at CERN Huge number of transactions to deal with Have to be 100% available Had to do their own database interaction layer at some point to answer their needs Not interested in Grids, because they want to control the whole infrastructure –Google Disk crash not correlated with temperature High number of disk crash when disks “burnt out” at the beginning of their life Tony Cass - post C5: “yes, but cooling is important for plugs and fuses”
10
Enabling Grids for E-sciencE Scheduling
11
Enabling Grids for E-sciencE Scheduling Possibility for users to give priority to their job is nowadays very limited –“low”, “medium” or “high” Utility functions –Economics applied to scheduling –Ex: if you go for lunch between 12:00 and 13:00 Same satisfaction if job finishes at 12:01 or 12:55… In the next talk –Hypothesis = “jobs are submitted completely randomly”
12
Enabling Grids for E-sciencE GridNFS & Direct-pNFS
13
Enabling Grids for E-sciencE GridNFS & Direct-pNFS GridNFS –“Integrates NFSv4 into the ecology of Grid middleware” Globus GSI support name space construction and management fine-grained access control with foreign user support high performance secure file system access –Andy Adamson was wondering how to integrate VOMS DPM and dCache are using virtual ids He is considering doing the same… –Contact: Andy Adamson (andros@umich.edu) Direct-pNFS –Outperforms pNFS, PVFS Especially, very good performance for small I/O –Contact: Dean Hildebrand (dhildebz@eecs.umich.edu)
14
Enabling Grids for E-sciencE DPM with NFSv4.1 NFSv4.1 and DPM have similar architectures –Separate metadata server –Direct access to physical files – Easy NFSv4.1 integration
15
Enabling Grids for E-sciencE Environmental concerns
16
Enabling Grids for E-sciencE Climate change ? Concerns about climate change –In several talks A “solar panel computer” A new plug to save energy lost in heat (Google)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.