Download presentation
Presentation is loading. Please wait.
1
Big Data …Big Opportunities ? ……Big Hype ?
(or just a Big Mess ?) Data challenges and IBM views Dr. Matthew Ganis IBM Senior Technical Staff Member CIO Social Media Analytics Chief Architect Member, IBM Academy of Technology @mattganis (twitter) 1 1 1
2
Big Data has been used to convey all sorts of concepts, including huge
The Term “Big Data” is pervasive - but still provokes a bit of confusion. SO what is it ? Big Data has been used to convey all sorts of concepts, including huge Quantities of data, social media analytics, next generation data management Capabilities, real time data and much much more..... 2
3
That means we create about
1.8 Zetabytes of Information every two years. 3
4
Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. 4
5
44x 80% 1 in 3 1 in 2 83% 60% Velocity Variety Volume
Information is at the Center of a New Wave of Opportunity… … And Organizations Need Deeper Insights 44x 2020 35 zettabytes Business leaders frequently make decisions based on information they don’t trust, or don’t have 1 in 3 as much Data and Content Over Coming Decade More and More Data More Sources and more type of Data (Structured and unstructured Data) Data arrived Faster and Faster Leaders don’t always have access to the right information to takes decision Leaders need to have deeper Insight and get them faster Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding – in volume, variety and velocity. And both struc and unstruc info will continue to grow at astronomical rates. This creates a tremendous opportunity for organizations to make timely decisions and achieve business goals. However, at the same time, organizations are struggling to gain deeper insights from this data. Business leaders continue to make decisions without access to the trusted information they need. CEOs understand that they need to do a better job in capturing and understanding information Tera = 10 puissance 12 bytes Peta = 10 puissance 15 = 1000 TB Exa = 10 puissance 18 Zeta = 10 puissance 21 1 milliars TB 1 in 2 Business leaders say they don’t have access to the information they need to do their jobs Velocity Variety of CIOs cited “Business intelligence and analytics” as part of their visionary plans to enhance competitiveness 80% 83% Volume 2009 800,000 petabytes Of world’s data is unstructured of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions 60% 5 5 5 5 5 5
6
Structured vs Unstructured
Structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite. The lack of structure makes compilation a time and energy-consuming task.
7
The Challenge: Bring Together a Large Volume and Variety of Data to Find New Insights
Multi-channel customer sentiment and experience a analysis Detect life-threatening conditions at hospitals in time to intervene Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach Imagine if you could analyze the 12B TB of tweets being created each day to figure out what people are saying about your products, figure out who the key influencers are within your target demographics. Can you imagine being able to mine this data to identify new market opportunities. What if hospitals could take the thousands of sensor readings collected every hour per patients in ICUs to identify subtle indications that the patient is becoming unwell, days earlier that is allowed by traditional techniques. Imagine if a green energy company could use PBs of weather data along with massive volumes of operational data to optimize asset location and utlization, making these environmentally friends energy sources more cost competitive with traditional sources. Imagine if you could make risk decisions, such as whether or not someone qualifies for a mortgage, in minutes, by analyzing many sources of data, including real-time transactional data, while the client is still on the phone or in the office. Image if law enforcement agencies could analyze audio and video feeds in real-time without human intervention to identify suspicious activity. As these new sources of data continue to grow in volume, variety and velocity, so too does the potential of this data to revolutionize the decision-making processes in every industry. Make risk decisions based on real- time transactional data Identify criminals and threats from disparate video, audio, and data feeds 7 7 7 7
8
Where we want to go
9
Merging the Traditional and Big Data Approaches
Traditional Approach Structured & Repeatable Analysis Big Data Approach Iterative & Exploratory Analysis IT Delivers a platform to enable creative discovery Business Users Determine what question to ask The Big Data approach complements the traditional approach. Traditional approach – Biz users determine what questions to ask and IT structure the data to answer that question. This is well suited to many common business processes, such as monitoring sales by geography, product or channel; extract insight from customer surveys; cost and profitability analyses. The Big Data approach – IT delivers a platform that consolidates all sources of info and enables creative discover. Then the business users use the platform to explore data for idea and questions to ask. Most of the time, the data are raw data. On the left, the traditional approach allows organization to answer questions that will be asked time and time again On the right, users have the ability to explore their data in a more creative way Before finding the answer, they must first define the question. Are my customers starting to change their preferences? What is the best way to measure brand image? IT Structures the data to answer that question Business Users Explores what questions could be asked Monthly sales reports Profitability analysis Customer surveys Structured vs. Exploratory Brand sentiment Product strategy Maximum asset utilization 9 9 9 9
10
Where is all this data coming from ?
10
11
Where is all this data coming from ?
11
12
The Internet of Things (IoT) is a scenario in which objects, animals or people are provided with unique identifies and the ability to automatically transfer data over a network without requiring human-to-human or human-to-computer interaction 12
13
Where is all this data coming from ?
13
15
Approximately 2.7 billion users on the Internet today
16
Social Media as Big Data
18
What are we running ? Who is talking about us ?
Male / Female / Student / Professional / Retired / Customers ? What do they “feel” ? Positive/Negative Sentiment / Angry / Annoyed ? Where are they talking ? Who are they influencing ? Who’s listening to them ?
23
When customers are talking about us or about our products we want to know where those conversations are happening so we can: Interact with interested customers Get in front of any issues
27
Numerous studies show that word-of-mouth and personal recommendations are seen as far more credible to consumers than newspaper and television advertisements. While such mass advertisements are still necessary because of their powerful reach, these findings show that companies need to increase their focus on more personalized approaches. Clearly, this is incredibly difficult, maybe even impossible, for most companies to deal directly with the countless number of potential consumers. This is where influencers come in……
28
What makes someone Influential ?
The number of tweets they make ? The number of times people mention them ? The number of followers they have? How often they are retweeted ?
30
We were asked to look at why a particular product launch wasn’t performing as expected. We pulled all the “chatter” about it and found:
31
But there were people talking about it…..
32
Some things to think about…..
33
Where is all this data coming from ?
While it is true that vast amounts of data are and will be generated from financial transactions, medical records, mobile phones and social media to the Internet of Things but there are questions that need to be asked to understand data’s meaningful use: How will data be managed? How will data be shared? Some thoughts about “data as a service” Establishment of standards, governance, guidelines. (E.g., open architectures) Creation of industry specific data exchanges. (E.g., healthcare data exchanges, environment data exchanges etc.) Creation of cross-industry data exchanges. (E.g., healthcare data exchanges seamlessly interacting with environmental data exchanges etc.) 33
34
Enterprise Integration
Data Warehouse Big Data Platform Trusted Information & Governance Companies need to govern what comes in, and the insights that come out Data Management Insights from Big Data must be incorporated into the warehouse Enterprise Integration Key Points Trust, governance, privacy – how you use data for the enterprise matters – this isn’t just a technology for an internet company, this is managing large volumes of potentially sensitive data for the enterprise. Govern what comes it, govern what goes out - How you use Big Data matters Even though “Big data” means all of the data, it doesn’t necessarily mean you bring in all of the data and expose it to everyone without any sort of governance or quality. Example of internet tweets or blog posts on upcoming M&A, it could be factored into brand sentiment analysis, but what if you are not supposed to factor that data into internal decision making? Commentaires additionels Integration is of great importance. IBM has a mature and broad software stack. A key differentiator for IBM is the high degree of integration between these components. The Big Data Plaftform is no exception, and will integrate with the established components of the IBM IM software stack. With integration come questions of trust, governance, and privacy – how you use data for the enterprise matters – this isn’t just a technology for an internet company, this is managing large volumes of potentially sensitive data for the enterprise. You must govern what comes in, and govern what goes out - How you use Big Data matters. Even though “Big data” means all of the data, it doesn’t necessarily mean you bring in all of the data and expose it to everyone without any sort of governance or quality. For example, internet tweets or blog posts on upcoming M&A, it could be factored into brand sentiment analysis, but what if you are not supposed to factor that data into internal decision making? Traditional Sources New Sources 34 34 34 34
35
Poor data quality Dirty data Missing values Inadequate data size
Poor representation in data sampling
36
How do we link them together ?
Data variety - trying to accommodate data that comes from different sources and in a variety of different forms (images, geo data, text, social, numeric, etc.). How do we link them together ? Is there a common taxonomy or why to organize it ? Is there a “signal” in one source of data that points to another ?
37
Dealing with huge datasets, or 'Big Data,' that require distributed approaches.
38
Who is influential ? How do we define influence ?
39
Thank you for your attention
39 39 39
40
Where is all this data coming from ?
40
41
The Big Data Opportunity
Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Variety: Velocity: Volume: Manage the complexity of multiple relational and non-relational data types and schemas Streaming data and large volume data movement Scale from terabytes to zettabytes (1B TBs) Big data is THE opportunity to extract insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Massive volume, variety and velocity are defining characteristics of Big Data. It is obvious that a Big Data Problem will have, well, Big Data Volume. This Volume can start in the Terabytes and quickly move to the 100’s of Petabytes. New Storage solution is needed to be used to have this type of volume. Velocity is obvious as well because no organization wants their answers slower! We hear demands for new insights or analytics ranging from; “we need it in 4 hours not 4 weeks” to “response must be real time, that is sub-second response.” The third “V”, Variety, is the least understood, and could be the most profound. Big Data is derived mostly from sources not analyzed or not used before. Why, because that data is not derived from classical transaction systems which lends themselves to structured models. Most, but not all, data that is in a Big Data Platform is unstructured, or has part of it unstructured. IBM offers a unique platform to ingest, store, manipulate, manage, and, most importantly, analyze Big Data to discover fresh insight that drives new business opportunities. The marketplace is driving the need for new insights. IBM has done extensive research on this pas utile It is not just that classic data warehouse platforms cannot store and access those volumes, it is that to cost and labor associated with storing that data on those platforms is prohibitively expensive and difficult to deploy. However, answers that businesses are demanding have to be based on increasingly sophisticated analytics and the rate of response demanded can be an order of magnitude faster some of which I will summarize with you now. In order to capitalize on this opportunity, enterprises must be able to analyze ALL types of data – relational and non relational. Texts, sensor data, audio, video, transactional. Sometimes, getting an edge over your competition can mean identifying a trend, problem or opportunity, seconds, or even microseconds before someone else. More and more of the data being produced today, has a very short half-life. Organizations must be able to analyze this data in real-time if they are to be able to find insights in this data. And, as implied by the term Big Data, organization are facing massive volumes of data. Organizations who don’t know how to manage this data, are overwhelmed by it. But the opportunity is, with the right technology, to analyze ALL the data, to gain a better understanding or your business, your customers, the marketplace. The most expedient way to describe a Big Data Problem is the to use the three V’s: Variety, Velocity, and Volume. Lets take them in reverse order and start with the most obvious “V” and that is Volume. 41 41 41 41
42
Send Consolidate result
Big Data : why is it possible Now ? Traditional approach : Data to Function Traditional approach Application server and Database server are separate Data can be on multiple servers Analysis Program can run on multiple Application servers Network is still a the middle Data have to go through the network Big Data Approach Analysis Program runs where are the data : on Data Node Only the Analysis Program are have to go through the network Analysis Program need to be MapReduce aware Highly Scalable : 1000s Nodes Petabytes and more User request Query Data Database server Application server Send result return Data Data process Data Big Data approach : Function to Data Query & process Data Send Function to process on Data Data nodes Data User request Data nodes Data Master node Data nodes Data Data nodes Data Send Consolidate result 42 42 42
43
It is not a replacement for your Database strategy
What Big Data Is Not It is not a replacement for your Database strategy It is not a replacement for your Warehouse strategy It is not a solution by itself, it needs jobs/applications to drive value 43 43 43 43
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.