Download presentation
Presentation is loading. Please wait.
Published byTodd Richardson Modified over 8 years ago
1
MIS 3500 Instructor: Bob Travica Trendy Database Topics 2016
2
Big Data 3 big V: Volume: terabytes (15 zeroes), petabytes (18 zeroes) Variety: Social media, communications, sensors everywhere*, Internet of Things, video feeds, GPS… Implication: various formats Velocity: wired and wireless continuous feeds 2
3
Big Data Goals and Uses Goals: Integrate data on the same object across sources (Customer, Citizen, Patient...; spatial mashups*) Analysis: Existing patterns (e.g., el. energy consumption over time), Predictive analysis (prediction of energy needs) Application domains: Product & object monitoring in real time via sensors (Internet of Things- IoT) Marketing (sentiment analysis in social media, discovering investment opportunities – major US banks) Energy grid management (IoT) 3
4
Big Data Uses (cont’d) Transportation networks management (cueing airplanes in air corridors in Brazil, optimizing cargo railroad net in Germany) Operations/process optimization (UPS sensors in trucks, manufacturing) Strategy making (Google, Facebook, banks; emerging strategy not planned) Health (integration of customer data, tracking/analyzing patient vital signs & cancer cell behavior) Science (human genome analysis, 2TB of data/person+gene interactions) Public safety/security (profiling outlaws) Policy analysis (United Nations’ system for predicting social problems) 4
5
Big Data Tasks 5 Querying unstructured data
6
Big Data Benefits & Costs 6 Comprehensive informing on business objects (customers, patients…) Pattern discovery, predictive analysis (fraud detection) More effective decision making (Citigroup) Savings (e.g., UPS operations) Strategizing for innovation (Google) Direct technology costs Truthfulness (“veracity”) of sources & findings Sense making challenges (big & “small” data) Legality, ethics Implementation & fit with organization
7
Machine-generated data (sensors); automatic creation and transfer * Home appliances (security, energy consumption, heating, food, entertainment) Monitoring/Control (cars, athletic equipment, machinery, appliances)* Example: Smart power grid** 7 Smart meter; Internet & Wi-Fi connectivity
8
Technologies Hadoop (framework for file system and processing of large datasets on server clusters)* Machine learning – automated construction of models to fit data (instead of hypothesis testing as with DW and Analytics) Open source Notable developers: Google, Facebook, Yahoo!, Microsoft 8 Microsoft Azure-based Hadoop
9
9 DATA PROCESSING
10
A database for Big Data Distributed, non-relational, scalable Based on Google’s BigTable * 10 Row Key (reversed URL)Time StampColumn Key – “Anchor” (Family) + URLpart (Qualifier) "com.cnn.www"t9anchor:cnnsi.com = "CNN" "com.cnn.www"t8anchor:my.look.ca = "CNN.com" Row KeyTime StampColumn Key – “Contents” + keyword in tagged content "com.cnn.www"t6 contents:html = " … " "com.cnn.www"t5 contents:html = " … " "com.cnn.www"t3 contents:html = " … " DATA are cites of “CNN*” referencing sites DATA are webpages compressed. There can be any number of unbound Contents Columns. All columns put together make a “BigTable”.
11
NoSQL – Not Only SQL* 11
12
Modern Database Systems 12
13
Modern Database Systems 13
14
Conclusion Modern database systems (DBS) still rely predominantly on relational DBS, while trying to integrate these with Big Data systems for unstructured & multi-type data, which are based on distributed storage & parallel processing. Ad-hoc relationship discovery and predictive analytics are major tasks and benefits. 14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.