© 2014 IBM Corporation Information Management Smart Data Analysis for IoT (Internet of Things) Applications Kun-Lung Wu, Ph.D., Manager Data-Intensive Systems & Analytics Group (IBM T. J. Watson Research Center) InfoSphere Streams Language & Research (IBM SWG)
© 2014 IBM Corporation Information Management As IoT applications become more pervasive, there is a real-time big data explosion Almost anything can be equipped and connected to the Internet Internet of Things They can generate, in real-time, streams and streams of data Real-Time Big Data Explosion Real-time data analysis is an integral part of many IoT applications Everything
© 2014 IBM Corporation Information Management Examples of IoT Applications 3 Smart cities Traffic control, emergency management, etc Health care Aiding the elderly, ICU alert management, health monitoring via wearable devices, etc Agriculture & food Precision farming, cold chain management, etc Industrial applications Manufacturing process monitoring, engine monitoring, etc Environmental monitoring Water, Waste, Air Quality, etc Retail applications
© 2014 IBM Corporation Information Management What is different in IoT data? There are many extremes Use uncertain dataUse more types data Veracity Process and act on data more quickly in real time Variety Volume There are greater amounts of data Velocity
© 2014 IBM Corporation Information Management Traditional versus IoT Big Data Available Information Analyzed Information Analyze ALL Available Information Traditional ApproachIoT Big Data Approach Analyze Small Subsets of InformationAnalyze All Information Leverage more of the data being captured
© 2014 IBM Corporation Information Management Traditional versus IoT Big Data Traditional ApproachIoT Big Data Approach Carefully Cleanse Information Before Any Analysis Analyze Information As Is, Cleanse As Needed A Small Amount of Carefully Cleansed Information Analyzed Information A Very Large Amount of Messy Information Analyzed Information Reduce effort required to leverage data
© 2014 IBM Corporation Information Management Traditional versus IoT Big Data Traditional ApproachIoT Big Data Approach Analyze data AFTER it has been processed and landed in a Warehouse or Mart Analyze data IN MOTION as it is generated, in real-time Leverage data as it is captured
© 2014 IBM Corporation Information Management 8 RE- Standard assumptionsRe-think for IoT data analysis Clean and correct dataTake advantage of and tolerate uncertainty Transactional guaranteesGood enough Normalized, structured dataStore data in elemental form Explicit relationships keptRelationships found at query ACID propertiesRelaxed constraints Centrally managed storageLoosely distributed data Store-and-processProcess in motion Reliable hardwareBuilt with full expectation of failures Query, insert, delete with SQLQuery, operators, analytics at point of data Reference/context data on diskReference and context data in memory
© 2014 IBM Corporation Information Management From data at rest to data in motion Data in Data at 9
© 2014 IBM Corporation Information Management Millions of events per second Microsecond Latency Traditional / Non-traditional data sources Real time delivery Powerful Analytics Algorithmic Trading Telco Churn Prediction Smart Grid Cyber Security Government / Law enforcement ICU Monitoring Environment Monitoring Volume Terabytes per second Petabytes per day Variety All kinds of data All kinds of analytics Velocity Insights in microseconds IBM InfoSphere Streams Delivers Real-Time Analytics For Big Data In Motion Example Streaming Data Sources: Video, audio, networks, social media
© 2014 IBM Corporation Information Management Modify Filter / Sample Classify Fuse Annotate Big Data in Real Time with Stream Processing Score Windowed Aggregates Analyze
© 2014 IBM Corporation Information Management Easy to extend: Built in adaptors Users add capability with familiar C++ and Java InfoSphere Streams: For superior real time analytic processing Compile groups of operators into single processes: Efficient use of cores Distributed execution Very fast data exchange Can be automatic or tuned Scaled with push of a button Streams Processing Language (SPL) built for Streaming applications: Reusable operators Rapid application development Continuous “pipeline” processing Flexible and high performance transport: Very low latency High data rates Use the data that gives you a competitive advantage: Can handle virtually any data type Use data that is too expensive and time sensitive for traditional approaches Easy to manage: Automatic placement Extend applications incrementally without downtime Multi-user / multiple applications Dynamic analysis: Programmatically change topology at runtime Create new subscriptions Create new port properties 12
© 2014 IBM Corporation Information Management What Are People Doing With Streams? Stock market Impact of weather on securities prices Analyze market data at ultra-low latencies Fraud prevention Detecting multi-party fraud Real-time fraud prevention e-Science Space weather prediction Detection of transient events Synchrotron atomic research Transportation Intelligent traffic management Smart Grid & Energy Transactive control Phasor Monitoring Unit Natural Systems Wildfire management Water management Telephony CDR processing Social analysis Churn prediction Geomapping Other Manufacturing Text Analysis Who’s Talking to Whom? ERP for Commodities FPGA Acceleration Law Enforcement, Defense & Cyber-Security Real-time multimodal surveillance Situational awareness Cyber security detection Health & Life Sciences Neonatal ICU monitoring Epidemic early warning system Remote healthcare monitoring 13
© 2014 IBM Corporation Information Management 14 Asian telco reduces billing costs and improves customer satisfaction Problem: Call volume increased to the point that batch processing in a warehouse no longer worked 1) Too expensive, 2) too slow, and 3) no capacity left for BI Solution: Real-time mediation and analysis of 8B CDRs per day Data processing time reduced from 12 hrs to 1 sec Hardware cost reduced to 1/8 th Further enabled: Proactively addressing issues impacting customer satisfaction, real time offers based on usage
© 2014 IBM Corporation Information Management Harnessing the Largest Predictive Focus Group in the World Purpose –Understand public sentiment towards an event: movie trailers –Deeply understand the potential customer profile: gender, occupation, intent to watch –Alter marketing launch plans based on insight Background –1.1 Billion Tweets analyzed –5.7 Million blogs/forum posts –3.5 million messages –Also: Facebook, Google+, Tumblr, Flickr
© 2014 IBM Corporation Information Management 16 Performing real-time analytics using physiological data from neonatal babies Continuously correlates data from medical monitors to detect subtle changes and alert hospital staff sooner Early warning gives caregivers the ability to proactively deal with complications “Helps detect life threatening conditions up to 24 hours sooner” University of Ontario Institute of Technology (UOIT) Detects Neonatal Patient Symptoms Sooner
© 2014 IBM Corporation Information Management Challenges and opportunities Approach overload –Is there a convergence of approaches? –Is there a “write once, use any technology” approach across tool types Skills to apply techniques –Reduce the skill required? –More people who can be data scientists, developers, and business/domain savvy? Uncertain data –Confidence levels need to follow data and decisions New analytic algorithms –Real time learning and adaptation? –More automation Availability –What does it mean for in-memory systems? –How should disaster recovery work? Cloud –Security of Data –Data movement Data governance, security, and privacy What new problems can we solve?
© 2014 IBM Corporation Information Management To Learn more Resources –Streams: streamsDev –IBM Big Data: ibm.com/bigdata –IBMBigDataHub.com –BigDataUniversity.com –Books / analyst papers
© 2014 IBM Corporation Information Management Try Stream Processing 2 download options! 19
© 2014 IBM Corporation Information Management 20