Large Scale Computing Systems

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

What is Cloud Computing? Massive computing resources, deployed among virtual datacenters, dynamically allocated to specific users and tasks and accessed.
What is Cloud Computing? Massive computing resources, deployed among virtual datacenters, dynamically allocated to specific users and tasks and accessed.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Parallel Research at Illinois Parallel Everywhere
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
Emerging Platform#6: Cloud Computing B. Ramamurthy 6/20/20141 cse651, B. Ramamurthy.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Microsoft Technical Computing Modeling the world with greater fidelity Wolfgang Dreyer, TC - Microsoft Germany.
Reporter: Haiping Wang WAMDM Cloud Group
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 1.
1 From Internet Data Centers to Data Centers in the Cloud Data Centers Evolution − Internet Data Center − Enterprise Data Centers −Web 2.0 Mega Data Centers.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Big Data A big step towards innovation, competition and productivity.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
SilverLining. Stuff we're covering Hardware infrastructure and scaling Cloud platform as a service The SilverLining Project.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
CSCI-2950u :: Data-Intensive Scalable Computing Rodrigo Fonseca (rfonseca)
Scientific Computing at Amazon Disruptive Innovations in Distributed Computing Dave Ward, Principal Product Manager Adam Gray, Senior Product Manager.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Amazon Web Services BY, RAJESH KANDEPU. Introduction  Amazon Web Services is a collection of remote computing services that together make up a cloud.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Cloud Computing Dave Elliman 11/10/2015G53ELC 1. Source: NY Times (6/14/2006) The datacenter is the computer!
Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
CPS 216: Advanced Database Systems Shivnath Babu.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
NoSQL: Graph Databases. Databases Why NoSQL Databases?
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX ) funded by the French program.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
J. Templon Nikhef Amsterdam Physics Data Processing Group Large Scale Computing Jeff Templon Nikhef Jamboree, Utrecht, 10 december 2012.
CS 405G: Introduction to Database Systems
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
CS122B: Projects in Databases and Web Applications Winter 2017
A free and open-source distributed NoSQL database
NOSQL.
Trends: Technology Doubling Periods – storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu compute capacity: 18 mos Then and Now Bandwidth 1985:
Introduction to MapReduce and Hadoop
Google and Cloud Computing
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
NoSQL Systems Overview (as of November 2011).
Storage Systems for Managing Voluminous Data
Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
Large Scale Distributed Computing
Big DATA.
Panel on Research Challenges in Big Data
Big-Data Analytics with Azure HDInsight
Presentation transcript:

Large Scale Computing Systems Data Computations Infrastructures A new ERA: BIG x 3 Revisit: algorithms, architectures, distributed systems, parallel computing, scalable DBs

Big Data ‘Moore's’ Law: Data doubles every 18 months 90% of today’s data was created in the last 2 years Facebook: 20TB/day compressed CERN/LHC: 40TB/day (15PB/year) NYSE: 1TB/day Many more Web logs, financial transactions, medical records, etc

Data Growth 35 ZB (Zettabyte-1021) 1 EB (Exabyte-1018) = 1000 PB (Petabyte-1015) Last year (2010) US mobile data traffic 0.8 ZB (Zettabyte) = 800 EB Entire global mass of digital data in 2009 according to IDC 35 ZB (Zettabyte-1021) IDC’s forecast for all digital data in 2020

MapReduce A programming model A software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes

Cloud computing Big Data pushes databases to their limits NoSQL databases Horizontal scalable schema-free multi-datacenter data stores that can handle PB of data Google’s BigTable, Facebook’s Cassandra, LinkedIn’s Voldemort, Amazon’s Dynamo, and many more Cloud Computing Virtualized resources from distant data centers Elastic and “pay as you go” resource provisioning Easy resource manipulation through an API

Big computations Challenges for exascale computing: Scalability up to millions of cores Programmability (revisit traditional parallel programming models) Fault tolerance (in thousands or millions of nodes, several may fail every day) Low power consumption (maximize GFLOP/WATT) It’s not High-Performance Computing (HPC) anymore… it’s High-Efficiency Computing (HEC)

Exascale applications Computations on sparse matrices: The heart of scientific and engineering simulations (Huge) Graph algorithms: Shortest paths, PageRank, etc Regular grids: solving PDEs with millions of unknowns

Big Infrastructures OS, Architectures revisited Virtualization Cloud Facilities - Datacenters Distributed storage: 100’s PBs using commodity disks HPC clusters: Exascale computing using scalable ‘ingredients’