Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,

Slides:



Advertisements
Similar presentations
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
Advertisements

Large Scale Computing Systems
FUTURE TECHNOLOGIES Lecture 13.  In this lecture we will discuss some of the important technologies of the future  Autonomic Computing  Cloud Computing.
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
Database Administration and Security Transparencies 1.
1 A Real Problem  What if you wanted to run a program that needs more memory than you have?
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Big Data A big step towards innovation, competition and productivity.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Ch 4. The Evolution of Analytic Scalability
Hybrid Cloud Security: Replication and Direction of Sensitive Data Blocks Glenn Michael Koch Eric Drew Advisor: Dr. XiaoFeng Wang Mentor: Kehuan Zhang.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
O’Reilly – Hadoop: The Definitive Guide Ch.1 Meet Hadoop May 28 th, 2010 Taewhi Lee.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Memory Systems How to make the most out of cheap storage.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CENTRALISED AND CLIENT / SERVER DBMS. Topics To Be Discussed………………………. (A) Centralized DBMS (i) IntroductionIntroduction (ii) AdvantagesAdvantages (ii)
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Project Name Program Name Project Scope Title Project Code and Name Insert Project Branding Image Here.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Copyright © 2016 – Curt Hill The Digital World Understanding the challenges of this world.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Lecture 15 Page 1 CS 236 Online Evaluating Running Systems Evaluating system security requires knowing what’s going on Many steps are necessary for a full.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Next Generation of Apache Hadoop MapReduce Owen
Guided By: Prof. Rajarshree Karande JSPM’S IMPERIAL COLLEGE OF ENGINEERING & RESEARCH WAGHOLI, PUNE Group MemberRoll No. Abhijeet Aralgundkar03.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
By: Joel Dominic and Carroll Wongchote 4/18/2012.
BIG DATA/ Hadoop Interview Questions.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Advanced Operating Systems Chapter 6.1 – Characteristics of a DFS Jongchan Shin.
Hadoop is a platform that enables business to: Process Big Data at reasonable costs Provides fault tolerance (continue operating in the event of failure)
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
Big Data Enterprise Patterns
Hadoop Aakash Kag What Why How 1.
Large-scale file systems and Map-Reduce
Hadoop.
Introduction to MapReduce and Hadoop
Eng Computation & Data Science.
Hadoop Clusters Tess Fulkerson.
湖南大学-信息科学与工程学院-计算机与科学系
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Bleeding edge technology to transform Data into Knowledge
PLANNING A SECURE BASELINE INSTALLATION
CS639: Data Management for Data Science
Presentation transcript:

Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. - Grace Hoppe

In 2012, an estimated 2.8 Zettabytes (2.8 Trillion GBs) was created in the world. This is enough to fill 80 Billion 32GB iPads Data is coming from many sources: NYSE generates ~1 TB of new trade data per day. Facebook hosts approximately 10 billion photos - taking up 1 petabyte of storage. Today’s Situation

It has been said that “More data usually beats better algorithms” Big Data is here, but we are struggling to store and analyze it Organizations IS departments can either: - Succumb to information-overload paralysis, or - Attempt to harness new technologies to monetize this data The way forward

Hadoop is a platform that enables business to: Process Big Data at reasonable costs Provides fault tolerance (continue operating in the event of failure) Enables massive parallel processing (MPP) Runs on commodity infrastructure – Cheap to run Available under the GNU GPL (General Public License) – Free to use. Business value

The basic principle of Hadoop is to have storage distributed across multiple nodes (composed of commodity servers). Nodes are orchestrated to process user requests in parallel. Hadoop is designed to run on a large number of machines that don’t share any memory or disks. That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one. How it works?

Architecturally, the reason you’re able to deal with lots of data is because Hadoop spreads it out; And the reason you’re able to ask complicated computational questions is because you’ve got all of these processors, working in parallel, harnessed together. Power of Hadoop

Exploratory Phase: - Recognizing responsibilities associated with deploying Hadoop. - Determining key issues and requirements around privacy. - Hadoop integration into existing IT infrastructure. Ongoing Governance

Implementation Phase: Determining business needs Organization Structure Stewardship Data Risk and Quality Management Information Life Cycle Management Security & Privacy Data Architecture Ongoing Governance (Cnt’d)