Big Data A big step towards innovation, competition and productivity.

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Chapter 14 The Second Component: The Database.
25 Need-to-Know Facts. Fact 1 Every 2 days we create as much information as we did from the beginning of time until 2003 [Source]Source © 2014 Bernard.
GROUP 1 : DATO’ NABIL ABD KADIR SAYNUL ISLAM MOHAMMAD GHAZALI MOHD DAUD.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
1 Introduction Introduction to database systems Database Management Systems (DBMS) Type of Databases Database Design Database Design Considerations.
Chapter 3 Foundations of Business Intelligence: Databases and Information Management.
Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI 1 ISQS 6339, Data Management & Business Intelligence Introduction.
© 2012 TeraMedica, Inc. Big Data: Challenges and Opportunities for Healthcare Joe Paxton Healthcare and Life Sciences Sales Leader.
Basic Concepts in Big Data
2.3 Methods for Big Data What is “Big Data”? Summarizing Big Data.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
X-Informatics Introduction: What is Big Data, Data Analytics and X-Informatics? January Geoffrey Fox
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
Fundamentals of Information Systems, Seventh Edition 1 Chapter 3 Data Centers, and Business Intelligence.
Cloud Computing & Big Data Beny. Erlien. Febrian. Ragnar. Billy.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Information Systems in Organizations Managing the business: decision-making Growing the business: knowledge management, R&D, and social business.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Data Management Managing Big Data Briefing 10/2012 Will Graves US-VISIT Chief Biometric engineer Chair of Biometric Domain.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
BACS 287 Big Data & NoSQL 2016 by Jones & Bartlett Learning LLC.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
LIMPOPO DEPARTMENT OF ECONOMIC DEVELOPMENT, ENVIRONMENT AND TOURISM The heartland of southern Africa – development is about people! 2015 ICT YOUTH CONFERENCE.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
Foundations of Business Intelligence: Databases and Information Management MGMT172: Lecture 04.
This is a free Course Available on Hadoop-Skills.com.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Foundations of Business Intelligence: Databases and Information Management Chapter 6 VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors.
Introduction:  Practices & Tuning of performances  Development of mass reduce programs  Local mode  Running without HDFS  Pseudo-distributed mode.
Big Data Analytics Hadoop is here to Stay!. What is Big Data? Large databases which are hard to dealComplex and Unstructured dataNeed for Parallel ProcessingHigh.
Microsoft Ignite /28/2017 6:07 PM
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
Data Analytics (CS40003) Introduction to Data Lecture #1
CNIT131 Internet Basics & Beginning HTML
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Organizations Are Embracing New Opportunities
Big Data is a Big Deal!.
SAS users meeting in Halifax
Data Administration SIG
Hadoop Aakash Kag What Why How 1.
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
BIG Data 25 Need-to-Know Facts.
Big-Data Fundamentals
Big Data Dr. Mazin Al-Hakeem (Nov 2016), “Big Data: Reality and Challenges”, LFU – Erbil.
1&1 Internet AG: Optimizing Debt Management
© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024Low Power Wide Area Network.
Big Data.
Microsoft Connect /22/2018 9:50 PM
Big Data.
Big Data Young Lee BUS 550.
TIM TAYLOR AND JOSH NEEDHAM
Big DATA.
Data Analysis and R : Technology & Opportunity
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
UNIT 6 RECENT TRENDS.
Presentation transcript:

Big Data A big step towards innovation, competition and productivity

Contents Big Data Definition Big Data Definition Example of Big Data Example of Big Data Big Data Vectors Big Data Vectors Cost Problem Cost Problem Importance of Big Data Importance of Big Data Big Data growth Big Data growth Some Challenges in Big Data Some Challenges in Big Data Big Data Implementation Big Data Implementation

Big Data Definition Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data. The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.

An Example of Big Data An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible. An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible. When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets. When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets.

Big Data vectors

Cost problem Cost of processing 1 Petabyte of data with 1000 nodes? 1 PB = B = 1 million gigabytes = 1 thousand terabytes 1 PB = B = 1 million gigabytes = 1 thousand terabytes 9 hours for each node to process 500GB at rate of 15MB/S 9 hours for each node to process 500GB at rate of 15MB/S 15*60*60*9 = MB ~ 500 GB 15*60*60*9 = MB ~ 500 GB 1000 * 9 * 0.34$ = 3060$ for single run 1000 * 9 * 0.34$ = 3060$ for single run 1 PB = / 500 = 2000 * 9 = h /24 = 750 Day 1 PB = / 500 = 2000 * 9 = h /24 = 750 Day The cost for 1000 cloud node each processing 1PB The cost for 1000 cloud node each processing 1PB 2000 * 3060$ = 6,120,000$ 2000 * 3060$ = 6,120,000$

Importance of Big Data Government: In 2012, the Obama administration announced the Big Data Research and Development Initiative. Government: In 2012, the Obama administration announced the Big Data Research and Development Initiative. 84 different big data programs spread across six departments. 84 different big data programs spread across six departments. Private Sector: Wal-Mart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. Private Sector: Wal-Mart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. Facebook handles 40 billion photos from its user base. Facebook handles 40 billion photos from its user base. Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. Science: Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days. Science: Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days.

Large Hardon Colider 13 Petabyte data produced in Large Hardon Colider 13 Petabyte data produced in Medical computation like decoding human Genome. Medical computation like decoding human Genome. Social science revolution Social science revolution New way of science (Microscope example) New way of science (Microscope example)

Technology Player in this field Google Google Oracle Oracle Microsoft Microsoft IBM IBM Hadapt Hadapt Nike Nike Yelp Yelp Netflix Netflix Dropbox Dropbox Zipdial Zipdial

Big Data growth

Some Challenges in Big Data While big data can yield extremely useful information, it also presents new challenges with respect to : While big data can yield extremely useful information, it also presents new challenges with respect to : How much data to store ? How much data to store ? How much this will cost ? How much this will cost ? Whether the data will be secure ? and Whether the data will be secure ? and How long it must be maintained ? How long it must be maintained ?

Implementation of Big Data Platforms for Large-scale Data Analysis : The Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and The Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and The MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value. The MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value.

Thank You!! By: Harshita Rachora Trainee Software Consultant Knoldus Software LLP