Download presentation
Presentation is loading. Please wait.
Published byJesse Stevens Modified over 9 years ago
1
Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team
2
Simplified data oriented system Internal or external data sources applications working on data
3
IT systems are constantly growing Increased number of users Increased number of applications Increased amount of data
4
IT systems are constantly growing Infrastructure bottleneck
5
Example Electronics manufacturer 24/7 production Report computation too long for decision making 2.5 million transactions daily 4TB data to manage
6
What is Cloud computing? „Transparant access to capabilities using a pay-per-use business model” Benefits: – Dynamic scaling – Pay-for-use – Off-shored administration
7
What are the delivery models? SaaS (Software as a Service) – SalesForce.com, 63,00 clients PaaS (Platform as a Service) – Google App Engine (2008), Microsoft Azure (2008) IaaS (Infrastructure as a Service) – Amazon Elastic Compute Cloud, 8.2 million instances launched since 2006
8
Application data processing Database sharding (MySQL, postgreSQL etc.) NoSQL (Google's BigTable, Amazon's Dynamo etc.) Data-grid (GigaSpaces XAP, Oracle Coherance, InfiniSpan etc.)
9
Data-grid and sharding in the Cloud All data processing and persistence in the Cloud Achievements: Near real-time Dynamic scaling (application and resources) Pay-per-use Reduced administration HA
10
Remaining issues Getting large datasets in and out of the Cloud – Bandwidth limited client side – Resort to mailing hard drives! Performance - 2 to 50% slow down Data security/privacy - trust SLAs – plan for the worst
11
Conclusions Data oriented systems datasets grow causing bottlenecks Datasets in the Cloud can be processed using scalable technologies Challenges remain Main – how to get the data to the Cloud?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.