Download presentation
Presentation is loading. Please wait.
Published byLenard Williams Modified over 9 years ago
1
Understanding Big Data Introduction
2
Information has always been a crucial resource for decision making. The lack of information in a subject can lead to mistakes, sometimes catastrophe. Research and marketing firms are designed to explore and create new information in areas shadowed in mystery or enlightened by countless, often contradictory, observations. Big Data is the new push into gathering, storing, analyzing, and creating value from data: much of which could not be explored until recently due to technological limitations. This toolkit is designed to introduce Big Data concepts and provide the tools to create a Big Data culture in your organization.
3
Defining Big Data While Big Data is a relatively new term in the industry, the concepts and principles are not new. They are built on data warehousing and data analysis concepts. Big Data as a Thing – Big Data refers to data sets which are so large or complex, they cannot be processed or analyzed without the use of machines, such as images or voice recordings. (Wikipedia) Big Data as a Paradigm – Big Data refers to the exponential growth, availability, and use of structured and unstructured data. (SAS) Big Data as a Technology – Big Data is a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. (IDC)
4
Facts about Data To understand the scope of Big Data, one must first understand a few facts about data in general. Most of these facts are from studies by the International Data Corporation. Existing Data in the world is estimated at 1.2 zettabytes; that’s 1.2 million gigabytes of data. Source: IDC 2011 Study
5
Facts about Data 2.5 quintillion bytes of data are being created every day. Source: IBM
6
Facts about Data The average file size containing data is decreasing. Source: 2012 IDC Study The number of files containing data is increasing.
7
Facts about Data 75% of data is generated by the consumer. Source: 2012 IDC Study 25% of data is generated by the enterprise. Enterprises are LIABLE for 80% of all data generated.
8
Facts about Data The data about a person far exceeds the data created by a person. Source: IDC 2011 Study
9
Facts about Data 25% of all data is unique or original. Source: 2012 IDC Study 75% of all data is a duplication of original data.
10
Facts about Data The most impressive companies today are leading in capturing and managing value from data.
11
Dimensions of Big Data Several companies have attempted to summarize the major areas of concerns or dimensions relevant to Big Data. Most build on those defined in the IDC definition: – Volume – Velocity – Variety – Variability – Complexity – Value – Trust (Veracity)
12
Big Data Dimension - Volume The most undeniable facet of Big Data is the volume of data that an enterprise must deal with—the data created by its internal operations, data from transactions with customers and suppliers, data from external consultants, partners, and regulators, and general industry data about competitors and market places. The development of Big Data within the enterprise focuses on finding value at the right time and place from any number of data ‘haystacks’.
13
Big Data Dimension - Velocity In the last slide, we alluded to an old saying… “finding a needle in a haystack”. A challenge to data and information has always been related to time: having the right information in the right place at the right time. The problem has been that finding the right data in enormous mounds of data can be time-consuming. For example, mapping the human genome took years. The opportunity for Big Data is the ability to search through not just one, but multiple haystacks in a matter of minutes.
14
Big Data Dimension - Variety Data exists in many forms: text, images, video, audio, documents, databases, etc. Many tools, past and present, have been designed to search through and analyze data in different formats, but only a relative few are effective in supporting all formats. And these few tools may not support any new formats in the future. Big Data architecture allows for any combination of tools and technologies to be used to gather, store, analyze, and manage multiple data formats into and across the enterprise.
15
Big Data Dimension - Variability SAS acknowledges that data generation, analysis, and use are not a constant stream of activities, but has ebbs and flows based on seasons and demand. For Big Data, this change requires planning and monitoring across the enterprise to ensure proper levels of capacity for all activities. Additionally, variability also covers the unpredictability of demand on data. One day, a person may need data on one topic; and the next on another topic; and on the third day, on a completely different topic.
16
Big Data Dimension - Complexity To support the variability of data for a single person, often, one must search through multiple sources (haystacks)—some of which are external to the enterprise. Through this process, links, connections, and relationships are made. While these associations may be driven manually by the person, many big data solutions attempt to create these associations automatically. For example, the leader in Big Data, Google Search, attempts to provide the most relevant data about a topic in declining order. The establishment of these relationships can be complex for one person; the addition of another person increases the complexity exponentially.
17
Big Data Dimension - Value What information is important and what information is not? This is a crucial and often misunderstood question for enterprises because most managers will make the determination based on their requirements, interests, and timing. Unfortunately, what is important to you may not be important to the next guy; what is important to you today may not be important to you tomorrow. Big Data Analytics attempts to determine the value of information over its lifecycle and for different population groups or communities.
18
Big Data Dimension - Trust IBM refers to this dimension as veracity and simply describes it as the trust decision-makers place in the data they have available to them. Surprisingly, 1 in 3 business leaders do not trust the available information. As the volume, variety, variability, and complexity of data continues to grow, so will the level of distrust if not managed appropriately. For enterprises reliant on their ability to use data effectively, one mishap may devastate their trust in available data; or worse, the trust of their customers in the enterprise.
19
Big Data Capabilities
20
Capabilities – Traditional Big Data builds on traditional methods for moving, processing, and searching through data. As the demand on data increases, these traditional methods will often prove insufficient and slow.
21
Capabilities – Fast Data One of the first steps away from traditional methods, often by adapting the traditional tools, is to increase the speed of data activities. Fast Data techniques focuses on developing the enterprise’s ability to process the majority of data in a relatively short time for the purposes of responding quickly to the situation generating the data. For instance, responding to a security incident identified after processing 1,000 unauthorized hits a second over the period of you reading this slide.
22
Capabilities – Big Analytics The focus of Big Analytics is to turn information into knowledge through a combination of older and newer approaches to create smart information management systems. The purpose is to enable a machine to identify hidden trends, patterns, and differences which previously could only be seen by humans.
23
Capabilities – Deep Insight The premise behind Deep Insight is the culmination of all Big Data efforts for an enterprise—to provide useful and relevant information to achieve a specific result, purpose, or goal. The journey to a result requires effective handling of that which is known and that which is unknown.
24
Knowledge "He that knows not, and knows not that he knows not is a fool. Shun him He that knows not, and knows that he knows not is a pupil. Teach him. He that knows, and knows not that he knows is asleep. Wake him. He that knows, and knows that he knows is a teacher. Follow him." (Arabic proverb) NEIGHBOUR R (1992) The Inner Apprentice London; Kluwer Academic Publishers. p.xvii
25
The Role of Big Data To process, analyze, and draw conclusions from massive amounts of data about perceivable and hidden trends, patterns, and differences across defined categories, space, and time How does Big Data fulfill this role?
26
The Components of Big Data Establishing Policies Identifying Data Sources Implementing Storage Solutions Improving Data Transfer Capabilities Developing Analytic Capabilities Exploring Data Visualization Concepts Creating a Data-Driven Organization
27
The Toolkit The Toolkit is designed to be holistic and somewhat comprehensive to Big Data. The technologies are too broad and diverse to be covered in a single toolkit. In addition, many organizations will already have a substantial foundation in one contributing technology to Big Data while struggling with another technology. The goal of the Big Data Toolkit is to define the contributing factors, major components, and their relationships, while providing the basic tools to take action based on the organization’s needs.
28
Moving Forward The presentations found within the Toolkit provide education about the different facets of Big Data. They can be used for self-edification or as the foundation for presenting a case to different levels of the organization. The process document, Developing Big Data Solutions, is intended to be a step by step guide in creating Big Data foundation in your organizations. Multiple templates have been created to support the process and to aid organizations in their efforts to improve their Big Data capabilities.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.