Presentation is loading. Please wait.

Presentation is loading. Please wait.

Making Hadoop Ready for the Enterprise Hadoop Summit, June 27, 2013

Similar presentations


Presentation on theme: "Making Hadoop Ready for the Enterprise Hadoop Summit, June 27, 2013"— Presentation transcript:

1 Making Hadoop Ready for the Enterprise Hadoop Summit, June 27, 2013
Anjul Bhambhri Vice-President, IBM Big Data Development © 2013 IBM Corporation

2 Big Data is the next Natural Resource
“We have for the first time an economy based on a key resource (Information) that is not only renewable, but self-generating. Running out of it is not a problem, but drowning in it is.” — John Naisbitt 40 ZB Harvesting any resource requires Mining, Refining and Delivering

3 Imagine the Possibilities…
IBM Innovate 2013 9/14/2018 1:24 AM Imagine the Possibilities… You could detect a neonatal infections sooner? What if… 120 children monitored :120K message per sec, billion messages per day Solution 24 hour earlier detection of infections University of Ontario Institute of Technology ftp://public.dhe.ibm.com/common/ssi/ecm/en/odc03157usen/ODC03157USEN.PDF [UOIT Case study] Fifteen million babies are born prematurely every year. Of those, over 1 million die, often in the first month of life. Many of these babies are in ICUs, connected to numerous monitors that measure key statistics such as heart rates, temperature, etc. Until recently, these measurements were only sampled and aggregated into 2-3 readings to indicate the health of the baby. IBM collaborated with UOIT to develop a solution that processes 1000 pieces of information/sec … identifies patterns …correlates this with doctor’s notes and family history… applies predictive analytics … and this has allowed us to spot the onset of an infection 24 hours in advance. Same data … but saved lives. To better detect subtle warning signs of complications, clinicians need to gain greater insight into the moment-by-moment condition of neonatal infants in a ICU. Fifteen million babies, one in 10 births, are born prematurely every year, a global project suggests led by the WHO. Of those over 1 million die, often in the first 30 days of life – a terrible tragedy. Yet, many of these babies are in NICUs, connected to all sorts of monitors that measure key statistics such as their heart rates, skin temperature, respiration, etc. These measurements add up to 90M/patient/day, yet most of this data is just sampled periodically and written into the patient record, not used for its predictive value. IBM and UOIT developed first-of-its-kind, analytics solution using stream-computing to capture and analyze real-time data from medical monitors, alerting hospital staff to potential health problems before patients manifest clinical signs of infection or other issues. Early warning gives caregivers the ability to proactively deal with potential complications—such as detecting infections in premature infants up to 24 hours before they exhibit symptoms. Solution monitors 120 children analyzing 120K message per second, billions of messages per day. Trials expanding beyond Canada to include hospitals in US, China and Australia. Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance Drury Design Dynamics

4 Constant Contact Transforming Marketing Campaign Effectiveness with IBM Big Data
Analyze 35 billion annual s to guide customers on best dates & times to send s for maximum response Benefits 40 times improvement in analysis performance 15-25% performance increase in customer campaigns Analysis time reduced from hours to seconds

5 Automobile and Manufacturing Quality Control and Customer Satisfaction
IBM 9/14/2018 Automobile and Manufacturing Quality Control and Customer Satisfaction In-flexibility and scalability limitations of existing IT solutions has been a inhibitor to competitive advantage. A new solution is needed to improve quality and operational efficiency Inventory control of parts Manufacturing equipment and assembly line data Warranty and services data from dealers Telemetry data from vehicles Next generation of Enterprise Data Warehouse: SA_Big_Data_NYC_Feb_18_v10 5

6 Transactional & Application Data
New Opportunities with Big Data & Analytics Transactional & Application Data Machine Data Enterprise Content Social Data Big Data and Technology Platform © 2013 IBM Corporation

7 New Opportunities with Big Data & Analytics
Data Scientist Business Analyst User Roles and Analytics Big Data and Technology Platform © 2013 IBM Corporation

8 Big Data and Technology Platform
New Opportunities with Big Data & Analytics Enrich info base Improve customer interaction Reduce risk Gain efficiency and scale Optimize and monetize New Outcomes Roles and Analytics Big Data and Technology Platform © 2013 IBM Corporation

9 Emerging Pattern of Big Data Implementation
Ingestion and Real-time Analytic Zone Analytics and Reporting Zone Ingest Filter, Transform Correlate, Classify Warehousing Zone Enterprise Warehouse Data Marts Query Engines Cubes Data Sinks Extract, Annotate Descriptive, Predictive Models Connectors Landing and Analytics Sandbox Zone Hive/HBase Col Stores Widgets Discovery, Visualizer Search Analytics MapReduce Indexes, facets Documents In Variety of Formats Models Metadata and Governance Zone Repository, Workbench Ingest 9

10 The 5 Key Use Cases Big Data Exploration
Find, visualize, understand all big data to improve decision making Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency

11 Big Data Platform and Application Framework
Analytic Applications Speed time to value with analytic and application accelerators BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics BI / Reporting Gather, extract and explore data using best of breed visualization IBM Big Data Platform Analyze streaming data and large data bursts for real-time insights Visualization & Discovery Applications & Development Systems Management Cost-effectively analyze petabytes of structured and unstructured information Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse Contextual Discovery Index and federated discovery for contextual collaborative insights Deliver deep insight with advanced in-database analytics and operational analytics Govern data quality and manage information lifecycle Cloud | Mobile | Security 11

12 Enterprise Capabilities on Hadoop
Key Platform Requirements Built-in analytics Enterprise-grade capabilities Integrated with enterprise software Ease of installation and management Reference hardware configurations World-class support Full open source compatibility Business benefits Quicker time-to-value Reduced operational risk Enhanced business knowledge with flexible analytical platform Leverages and complements existing software investments Visualization & Exploration Development Tools Advanced Engines Connectors Workload Optimization Administration & Security Open source components IBM-certified Apache Hadoop 12

13 Big Data needs SQL Hadoop
Application Big SQL Engine Hadoop HiveTables HBase tables CSV Files Data Sources SQL Language JDBC / ODBC Driver JDBC / ODBC Server Most existing applications in the enterprise use SQL SQL bridges the chasm between existing apps and Big Data SQL access to all data stored in Hadoop Via JDBC/ODBC Using rich standard SQL Intelligently leverage Map/Reduce parallelism OR direct access for achieving low-latency

14 Text Analytics: Getting measurable insights
Most of the world’s data is in unstructured or semi-structured text. Social media is rife with discussions about products and services Company Internal Information is locked in blobs, description fields, and sometimes even discarded How do you get a metrics based understanding of facts from unstructured text? Healthcare Analytics: E-Medical records, hospital reports Public Sectors Case files, police records, emergency calls… Automotive Quality Insight: Tech notes, call logs, online media Insurance Fraud: Insurance claims Social Media for Marketing: twitter, facebook, blogs, forums Over 80% of stored information is unstructured* Structural analysis Mining and visualization

15 How Text Analytics Works
Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win. World Cup 2010 Highlights Arjen Robben Striker Netherlands Iker Casilas Keeper Spain Andres Iniesta Winger Spain

16 Text Analytics Language and Runtime
Offline Runtime Dominant Cost is CPU General-Purpose Linguistic Parsers Dictionaries Role Dict Select Company Join Development Environment Extracted Objects Role Join Select Company Dict AQL Extractor Text Analytics Runtime create view Employment as select R.jobType as jobType, C.name as companyName from Company C, Role R where Follows(R.jobType, C.name, 0, 20) and ContainsDict('EmpAssociation.dict', RightContext(R.jobType,10)); Cost-based optimization Role Select Join Company Dict Input Documents High-throughput Small memory footprint Declarative SQL-like language Discovery tools for AQL development

17 Enterprise Data Tools Business User Data Scientist Business Analyst
Developer Administrator 17

18 Security and compliance in Big Data environments
Who is running specific big data requests? What map-reduce jobs are they running? Are these jobs part of an authorized program list accessing the data? Is there an exceptional number of file permission exceptions? Structured Big Data Platform Unstructured Streaming Hadoop Cluster Clients Taps for Hadoop Collects and streams audit data to Collector Provides visibility for HDFS, MapReduce, RPC, Oozie, HBase, etc. Securely stores audit data collected by TAPs Provides analytics, reporting & compliance workflow automation 18

19 Data Archiving and Masking on Hadoop
Mask confidential data to avoid data breach & meet privacy compliance Protect confidential data while preserving analytics Support compliance with privacy regulations Cost-effective query-able archiving Manage, apply retention policies for compliance Enable business users to query on Hot, Warm and Cold data Data Archiving Database Hadoop Data Masking JASON MICHAELS ROBERT SMITH Mask Before Masking After Masking Mask in-database Extract Mask in Hadoop Archive & Purge Load Query-able Auditable Restorable Data Complete Business Objects Data Integrity Schema, Metadata Retention Policies Archive files Compress

20 Introducing pureData for Hadoop – BigInsights Appliance
Simplified Experience Designed for easy and quick deployment Built-in tools designed for users to derive value quickly Easy connectivity to common data warehouse systems Built-in Expertise Enables ‘what-if analysis’ and advanced analytics Supports structured, semi-structured, and unstructured data Built-in text processing engine and library of annotators to analyze large volumes of text-based information Data can be used in its native format eliminating need to pre-define and map structures Integration by Design InfoSphere BigInsights software, cluster management, and IBM System x® servers Automatic parallelization and resource optimization to scale economically Enterprise-class security and platform management 20

21 Breadth of capabilities
From Getting Starting to Enterprise Deployment: InfoSphere BigInsights Brings Hadoop to the Enterprise PureData for Hadoop Appliance simplicity for the enterprise * Pre-announced Enterprise class Enterprise Edition Sold by # of terabytes managed Quick Start features PLUS: Accelerators Enterprise Integration Production support Production-ready features Free download, non-production Quick Start Edition Big Sheets Text Analytics Big SQL Workload optimization/ Query support Dev tools Connectors Mgmt tools IBM Hadoop Core Basic Edition Free download Web-based mgmt console Jaql Integrated install Apache Hadoop Breadth of capabilities © 2013 IBM Corporation 21

22 Streams - Real Time Analytics
22 22

23 InfoSphere Data Explorer – delivering insights at the point of impact
Providing unified, real-time access and fusion of big data unlocks greater insight and ROI Discovery & navigation Clustering & categorization Contextual intelligence Easy-to-deploy applications All at the scale required for today’s big data challenges Data access & integration Index structured & unstructured data—in place Support existing security Federate to external sources Leverage MDM, governance, and taxonomies Create unified view of ALL information for real-time monitoring Improve customer service & reduce call times Increase productivity & leverage past work increasing speed to market Analyze customer data to unlock true customer value Identify areas of information risk & ensure data compliance 23

24 Organizations are Building Big Data Applications on Data Explorer
Warehouse Structured Enterprise Data Streams Data in motion Data Explorer App Builder BigInsights Data at rest Data Explorer Semi- & unstructured enterprise data 24

25 Get Started on Your Big Data Journey Today
Get Educated IBM Big Data: ibm.com/bigdata IBMBigDataHub.com BigDataUniversity.com Get Your Hands on Big Data Download Quick Start ibm.co\QuickStart 25

26 THINK


Download ppt "Making Hadoop Ready for the Enterprise Hadoop Summit, June 27, 2013"

Similar presentations


Ads by Google