Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data A big step towards innovation, competition and productivity.

Similar presentations


Presentation on theme: "Big Data A big step towards innovation, competition and productivity."— Presentation transcript:

1 Big Data A big step towards innovation, competition and productivity

2 Contents Big Data Definition Big Data Definition Example of Big Data Example of Big Data Big Data Vectors Big Data Vectors Cost Problem Cost Problem Importance of Big Data Importance of Big Data Big Data growth Big Data growth Some Challenges in Big Data Some Challenges in Big Data Big Data Implementation Big Data Implementation

3 Big Data Definition Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data. The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.

4 An Example of Big Data An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible. An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible. When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets. When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets.

5 Big Data vectors

6 Cost problem Cost of processing 1 Petabyte of data with 1000 nodes? 1 PB = 10 15 B = 1 million gigabytes = 1 thousand terabytes 1 PB = 10 15 B = 1 million gigabytes = 1 thousand terabytes 9 hours for each node to process 500GB at rate of 15MB/S 9 hours for each node to process 500GB at rate of 15MB/S 15*60*60*9 = 486000MB ~ 500 GB 15*60*60*9 = 486000MB ~ 500 GB 1000 * 9 * 0.34$ = 3060$ for single run 1000 * 9 * 0.34$ = 3060$ for single run 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 Day 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 Day The cost for 1000 cloud node each processing 1PB The cost for 1000 cloud node each processing 1PB 2000 * 3060$ = 6,120,000$ 2000 * 3060$ = 6,120,000$

7 Importance of Big Data Government: In 2012, the Obama administration announced the Big Data Research and Development Initiative. Government: In 2012, the Obama administration announced the Big Data Research and Development Initiative. 84 different big data programs spread across six departments. 84 different big data programs spread across six departments. Private Sector: Wal-Mart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. Private Sector: Wal-Mart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. Facebook handles 40 billion photos from its user base. Facebook handles 40 billion photos from its user base. Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. Science: Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days. Science: Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days.

8 Large Hardon Colider 13 Petabyte data produced in 2010. Large Hardon Colider 13 Petabyte data produced in 2010. Medical computation like decoding human Genome. Medical computation like decoding human Genome. Social science revolution Social science revolution New way of science (Microscope example) New way of science (Microscope example)

9 Technology Player in this field Google Google Oracle Oracle Microsoft Microsoft IBM IBM Hadapt Hadapt Nike Nike Yelp Yelp Netflix Netflix Dropbox Dropbox Zipdial Zipdial

10 Big Data growth

11 Some Challenges in Big Data While big data can yield extremely useful information, it also presents new challenges with respect to : While big data can yield extremely useful information, it also presents new challenges with respect to : How much data to store ? How much data to store ? How much this will cost ? How much this will cost ? Whether the data will be secure ? and Whether the data will be secure ? and How long it must be maintained ? How long it must be maintained ?

12 Implementation of Big Data Platforms for Large-scale Data Analysis : The Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and The Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and The MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value. The MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value.

13 Thank You!! By: Harshita Rachora Trainee Software Consultant Knoldus Software LLP


Download ppt "Big Data A big step towards innovation, competition and productivity."

Similar presentations


Ads by Google