Download presentation
Presentation is loading. Please wait.
Published bySusan Lee Modified over 9 years ago
1
+ Big Data IST210 Class Lecture
2
+ Big Data Summary http://www.youtube.com/watch?v=eEpxN0htRKI by EMC Corporation (http://www.emc.com) More videos that pertain to data are found here: http://www.youtube.com/playlist?list=PLD298CBF8D0908E4C
3
+ What is Big Data? In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using traditional relational database management systems. Relational Data-Base Management System
4
+ Challenges: capturing storing searching sharing analyzing visualization
5
+ Data types with size issues: scientific models/simulations (biology, astrophysics) genetic studies traffic internet searching business information (order management to stock data)
6
+ What’s making so much data? ubiquitous computing (an area of study for those interested) more people carrying data-generating devices (mobile phones with facebook, gps, cameras, etc.)
7
+ Just how big are we talking? In 2012 we hit the capability of creating and storing 2.5 quintillion bytes of data PER DAY (2.5 x 10^18) (2.5 billion gigabytes) 90% of the world's data created in last two years Human genome, at the time it was originally mapped, took 10 years to process. It can now be done in a week (as of 2012). Walmart handles 1 million+ transactions per hour and needs to store these for analysis to determine what products sell where, etc.
8
+ Where is the problem? When trying to get useful information out of the huge volume of data (drinking from a fire-hose), the use of traditional RDBMS queries isn't sufficient. Why? IF you could store all of this data for one example (all tweets in a week, for instance), to search it with traditional tools to find out if a particular topic was trending would take so long that the result would be meaningless by the time it was computed. Big Data solutions, then, consider how to store this data in novel ways in order to make it more accessible, and also to come up with methods of performing analysis on it.
9
+ Where is the problem? This quite commonly now includes massively parallel software on anywhere from hundreds to thousands of servers, which could be virtual machines themselves on growing server farms. The overall idea of "big data" includes not only storage and analysis, but considering just how to shape the data, what to store, how store, how to search, share and visualize it. There is so much demand, right now, for understanding how to handle the massive amounts of data and make it useful that the industry is now more than $100 billion in size and growing at about 10% per year, about twice as fast as other software technology.
10
+ Changing how we store data: Big Data analytics, in order to be performed in a practically useful manner, are requiring a redevelopment of data storage. Instead of older SAN storage farms or data warehouses, data is moving into directly connected (Direct-Attached Storage: DAS) of things like solid state disks or large SATA disks attached to parallel processing nodes. This brings the huge amounts of data closer to large processing capabilities in order to perform more timely analytics.
11
+ Activity/Discussion: http://www.mckinsey.com/insights/mgi/resear ch/technology_and_innovation/big_data_the_n ext_frontier_for_innovation http://www.mckinsey.com/insights/mgi/resear ch/technology_and_innovation/big_data_the_n ext_frontier_for_innovation What do you take away from this reading?
12
+ Structured Storage A Column (not the same as a column in a relational database) A Super Column A Column Family
13
+ Getting Information Out of Structured Storage - Map Reduce Map – distribute the task among multiple computers Reduce – take the results from each computer and combine them
14
+ IBM considers Big Data: Big data spans four dimensions: Volume, Velocity, Variety, and Veracity. Volume: Enterprises are awash with ever- growing data of all types, easily amassing terabytes—even petabytes—of information. Turn 12 terabytes of Tweets created each day into improved product sentiment analysis Convert 350 billion annual meter readings to better predict power consumption
15
+ IBM considers Big Data: Big data spans four dimensions: Volume, Velocity, Variety, and Veracity. Velocity: Sometimes 2 minutes is too late. For time- sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Scrutinize 5 million trade events created each day to identify potential fraud Analyze 500 million daily call detail records in real- time to predict customer churn faster
16
+ IBM considers Big Data: Big data spans four dimensions: Volume, Velocity, Variety, and Veracity. Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. Monitor 100’s of live video feeds from surveillance cameras to target points of interest Exploit the 80% data growth in images, video and documents to improve customer satisfaction
17
+ IBM considers Big Data: Big data spans four dimensions: Volume, Velocity, Variety, and Veracity. Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.
18
+ Discussion What do you think? Opinions of all this?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.