Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,

Slides:



Advertisements
Similar presentations
Finding a needle in Haystack Facebook’s Photo Storage
Advertisements

Performance Testing - Kanwalpreet Singh.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.
Large Scale Computing Systems
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Database Administration and Security Transparencies 1.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
High memory instances Monthly SLA : Virtual Machines Validated & supported Microsoft workloads Price reduction: standard Windows (22%) & Linux (29%)
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Big Data A big step towards innovation, competition and productivity.
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
Ch 4. The Evolution of Analytic Scalability
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Understanding Data Warehousing
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
O’Reilly – Hadoop: The Definitive Guide Ch.1 Meet Hadoop May 28 th, 2010 Taewhi Lee.
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Project Name Program Name Project Scope Title Project Code and Name Insert Project Branding Image Here.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Copyright © 2016 – Curt Hill The Digital World Understanding the challenges of this world.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
BIG DATA/ Hadoop Interview Questions.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Hadoop is a platform that enables business to: Process Big Data at reasonable costs Provides fault tolerance (continue operating in the event of failure)
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
Map reduce Cs 595 Lecture 11.
Big Data Enterprise Patterns
Hadoop Aakash Kag What Why How 1.
Large-scale file systems and Map-Reduce
Introduction to MapReduce and Hadoop
Storage Virtualization
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
Bleeding edge technology to transform Data into Knowledge
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. - Grace Hoppe

In 2012, an estimated 2.8 Zettabytes (2.8 Trillion GBs) was created in the world. This is enough to fill 80 Billion 32GB iPads Facebook hosts approximately 10 billion photos - taking up 1 petabyte of storage NYSE generates ~1 TB of new trade data per day Data is coming from many sources Current Situation

Popular Saying: “More data usually beats better algorithms” Big Data is here, but unable to properly store and analyze it Organizations IS departments can either: - Succumb to information-overload paralysis, or - Attempt to harness new technologies to monetize this data Data - Moving Forward

Still bleeding edge technology – mostly Tech companies Yahoo: Used to store in internet Index 43,000 nodes Servers racked with Velcro (MTBF 1000 days) FaceBook User behavioral analysis for target adds Over 100 Petabytes Growing at ½ PB/day Early Adopters

Storage distributed across multiple nodes Nodes are composed of commodity servers Nodes orchestrated to process requests in parallel. Designed to run on a large number of independent machines Buy a LOT of commodity hardware, add disks and run Hadoop Hadoop can process all types of data from different systems, pictures, log files, audio files, s, etc. regardless of its native format How it works?

Framework is not suited for transactional environments Long load time; difficulty (almost impossible) to edit part of file (only full load) Limitations

Hadoop is a platform that enables business to: Process Big Data at reasonable costs Provides fault tolerance (continue operating in the event of failure) Enables massive parallel processing (MPP) Runs on commodity infrastructure – Cheap to run Available under the GNU GPL (General Public License) – Free to use. Business value

Fear of the unknown Lack of skillset Lack of understanding of ROI Open Source (potential security risk) Challenges of adoption

Exploratory Phase: - Recognizing responsibilities associated with deploying Hadoop. - Determining key issues and requirements around privacy. - Hadoop integration into existing IT infrastructure. Ongoing Governance

Implementation Phase: Determining business needs Organization Structure Stewardship Data Risk and Quality Management Information Life Cycle Management Security & Privacy Data Architecture Ongoing Governance (Cnt’d)