Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Grids for E-sciencE High Performance Distributed Computing Sophie Lemaitre Monterey - California July 2007.

Similar presentations


Presentation on theme: "Enabling Grids for E-sciencE High Performance Distributed Computing Sophie Lemaitre Monterey - California July 2007."— Presentation transcript:

1 Enabling Grids for E-sciencE High Performance Distributed Computing Sophie Lemaitre Monterey - California July 2007

2 Enabling Grids for E-sciencE Database Streams

3 Enabling Grids for E-sciencE First Keynote One of the most interesting talks –Database streams  http://www.cs.berkeley.edu/~franklin/Talks/HPDC07.ppt http://www.cs.berkeley.edu/~franklin/Talks/HPDC07.ppt

4 Enabling Grids for E-sciencE Upside Down Approach Static Batch Reports Bulk Load Data QueriesResults Batch ETL & load, query later Poor RT monitoring, no replay DB size affects query response Traditional Database Approach Data Warehouse Always-on data analysis & alerts RT Monitor & Replay to optimize Consistent sub-second response Data Stream Processing Approach Continuous, Visibility, Alerts Live Data Streams Results Data Stream Processor

5 Enabling Grids for E-sciencE The “Jellybean” Argument Reality: With stream query processing, real-time is cheaper than batch. –minimize copies & query start- up overhead –takes load off expensive back- end systems –rapid application dev & maintenance Conventional Wisdom: “can I afford real-time?” Do the benefits justify the cost?

6 Enabling Grids for E-sciencE Example 2 - Stream/Table Join SELECT T.symbol, AVG(T.price*T.volume) FROM Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 S WHERE T.symbol = S.symbol AND T.volume > 5000 GROUP BY T.symbol Every 3 seconds, compute avg transaction value of high-volume trades on S&P 500 stocks, over a 5 second “sliding window” Stream Table Window clause Note: Output is also a Stream

7 Enabling Grids for E-sciencE Stream Processing + Grid? On-the-fly stream processing required for high-volume data/event generators. Real-time event detection for coordination of distributed observations. Wide-area sensing in environmental macroscopes.

8 Enabling Grids for E-sciencE Industry session

9 Enabling Grids for E-sciencE Industry session Most interesting session –eBay  Same talk than at CERN Huge number of transactions to deal with Have to be 100% available Had to do their own database interaction layer at some point to answer their needs Not interested in Grids, because they want to control the whole infrastructure –Google  Disk crash not correlated with temperature High number of disk crash when disks “burnt out” at the beginning of their life  Tony Cass - post C5: “yes, but cooling is important for plugs and fuses”

10 Enabling Grids for E-sciencE Scheduling

11 Enabling Grids for E-sciencE Scheduling Possibility for users to give priority to their job is nowadays very limited –“low”, “medium” or “high” Utility functions –Economics applied to scheduling –Ex: if you go for lunch between 12:00 and 13:00  Same satisfaction if job finishes at 12:01 or 12:55… In the next talk –Hypothesis = “jobs are submitted completely randomly”

12 Enabling Grids for E-sciencE GridNFS & Direct-pNFS

13 Enabling Grids for E-sciencE GridNFS & Direct-pNFS GridNFS –“Integrates NFSv4 into the ecology of Grid middleware”  Globus GSI support  name space construction and management  fine-grained access control with foreign user support  high performance secure file system access –Andy Adamson was wondering how to integrate VOMS  DPM and dCache are using virtual ids  He is considering doing the same… –Contact: Andy Adamson (andros@umich.edu) Direct-pNFS –Outperforms pNFS, PVFS  Especially, very good performance for small I/O –Contact: Dean Hildebrand (dhildebz@eecs.umich.edu)

14 Enabling Grids for E-sciencE DPM with NFSv4.1 NFSv4.1 and DPM have similar architectures –Separate metadata server –Direct access to physical files – Easy NFSv4.1 integration

15 Enabling Grids for E-sciencE Environmental concerns

16 Enabling Grids for E-sciencE Climate change ? Concerns about climate change –In several talks  A “solar panel computer”  A new plug to save energy lost in heat (Google)


Download ppt "Enabling Grids for E-sciencE High Performance Distributed Computing Sophie Lemaitre Monterey - California July 2007."

Similar presentations


Ads by Google