Steve Vertica - Hewlett Packard Enterprise

Steve Sarsfield @SteveSarsfield Vertica - Hewlett Packard Enterprise
Preparing your organization to derive insight from the internet of things Steve Sarsfield @SteveSarsfield Vertica - Hewlett Packard Enterprise March 2017

11/28/2017 3:17 AM Driving customer demand for a smarter and more personalized product experience Predictive maintenance Fraud detection Electronic health records Presenter Name Customer support Product recommendations

Challenges Handling more data Time to deliver analytics
Costs of License Tuning Costs Skills to leverage new tools

The future belongs to those who analyze without limits
With analytics free from closed infrastructure and narrow deployment options Traditional data warehouse lock-in Cloud analytics deployment lock-in Hadoop and open source

HPE Vertica All built on the same trusted and proven HPE Vertica Core SQL Engine HPE Vertica In the Cloud Get up and running quickly in the cloud Flexible, enterprise-class cloud deployment options The HPE Vertica Portfolio Regardless of how our customers want to consume and deploy Vertica, we have them covered. Most importantly, the entire Vertica Portfolio is based on the same, trusted, field-proven Vertica SQL engine and rich analytical functionality. So, whether customers need to access Big Data analytics via the cloud either as SaaS or run on select Amazon hardware, on-premise, or co-located Hadoop, no one provides the breadth of functionality and consumption models as HPE Vertica! HPE Vertica Enterprise Columnar storage and advanced compression Maximum performance and scalability Core HPE Vertica SQL Engine Advanced Analytics Open ANSI SQL Standards ++ R, Python, Java, Spark. Scala In-database machine learning HPE Vertica for SQL on Hadoop Native support for ORC and Parquet Support for industry-leading distributions No helper node or single point of failure

The appeal of Vertica Requirement Proof Extreme Optimization
Columnar design for high performance analytics Aggressive compression Scalable to petabyte scale Total Cost of Ownership Simply and predictable pricing No penalty for additional hardware or connected users Ready for your Enterprise SQL compliant to 100% of the TPC-DS benchmark queries Secure and ACID compliant No single point of failure Open and Compatible Open platform – Standards compliant SQL, Python, Java Working with open source community on Spark, Hadoop, Kafka, etc.

Bridging the gap between high cost legacy EDWs and Hadoop data lakes
Legacy Electronic Data Warehouse Declining performance at scale Built on aging technology Expensive w/ proprietary hardware Limited deployment options Data Lakes Low-cost storage of Big Data Some analytics capabilities Holding area for certain data

Complexity – Example: Analytics Ready for Internet of Things
R, Python and Custom Analytics Goal Deliver analysis of critical data at the source of the data and provide faster time to insight Access rich custom and predictive analytics in your favorite languages and tools, including R, Python, and custom functions. Live Aggregate Projections Speed up queries that rely on resource-intensive aggregate functions like SUM, MIN/MAX, COUNT and Top-K Pattern Matching Find matching subsequences of events, compare the frequency of event patterns Event Windows Break a sequence into subsequences based on certain events or changes Event Series JOINS Correlate events across streams when the times do not line up SQL-99 Full ANSI SQL compliant

How to fill analysis gaps
Customer Segmentation Channel & Location Analysis Net Profit Revenue Geospatial Data Types Geospatial – There are no native SQL on hadoop function in any of the solutions. However, you can bring in a solution like SpatialHadoop, a MapReduce extension to Apache Hadoop designed specially to work with spatial data. Very time-consuming Data types – Vertica supports date, time and many more data types than SQL on Hadoop solution In–place JOINs – On Vertica allows you to JOIN data that is sitting in your Vertica data warehouse with data that is sitting in an ORC or Parquet file in Hadoop. In other solutions, you must move the data. Time Series Gap - Since both time and the state of data within a time series are continuous, it can be challenging to evaluate SQL queries over time. Input records often occur at non-uniform intervals, which can create gaps. To solve this problem Vertica provides: 1) Gap-filling functionality, which fills in missing data points; 2) Interpolation scheme, which constructs new data points within the range of a discrete set of known data points. This is not available on Hadoop solutions Event window - Event-based windows let you break time series data into windows that border on significant events within the data. This is especially relevant in financial data where analysis often focuses on specific events as triggers to other activity. Sessionization - Sessionization, a special case of event-based windows, is a feature often used to analyze click streams, such as identifying web browsing sessions from recorded web clicks. Time series gap analysis Event window functions Sessionization Statistical functions In-place JOINs With some solutions, you may be required to fill the gaps with Spinning together two or more open source projects Moving and copying big data Using Generic Data Types Data munging Custom Code

Perhaps the ultimate architecture is all-inclusive
Apache Spark, Hadoop and Kafka HPE Vertica Optimal Use Case Deep Analysis Massive scale Many concurrent users Challenges: After transformation is done in Spark, need faster load of data into Vertica for SQL Analytics Supply data to Spark machine learning analytics Solution: Vertica open-source connector to Apache Spark Benefits: Fast, scalable data transfer, exploiting Vertica’s parallelism, HDFS connectivity, and fluency with open source data formats Optimized query where processing is pushed down to Vertica Spark users can benefit from Vertica’s very advanced SQL analytics Features: Analyze-in-place without data movement via native ORC and Parquet readers Any Hadoop Run ON the Hadoop cluster or ON Vertica cluster Features: Vertica performs optimized data load from Spark Spark runs queries on Vertica data Kafka Spark Optimal Use Case Small, fast running queries ETL and complex event processing Operational analytics Hadoop Optimal Use Case Data lake Warm, cold storage Data discovery ETL Features: Share data between applications that support Kafka Data streaming into Vertica

Average annual benefit: $3,014,583
11/28/2017 3:17 AM “The choice was simple: the change to Vertica was much more cost effective than scaling their current Oracle system, while offering a much improved performance to execute very complex analytics use cases” ROI: 351% Payback: 4 months Average annual benefit: $3,014,583 Presenter Name

Suunto – Internet of Things (IoT)
Suunto enriches extreme-sport experience with IoT wearable analytics on Vertica Challenges User data needed to be instantly gathered and compared to not only the user’s own historical data, but also data from specific sub-segments of Suunto users who have completed similar performance and demographic characteristics Management and analysis of 20 to 30 million individual training sessions per month in near real-time Results Ability to organize over 1 billion data measurements and cluster sub-segments of this data in a manner that allows for meaningful performance evaluation and improvement based upon real user data Combine a variety of data measurements from devices and users to instantly provide summary performance data, peer comparisons, and training regimens Vertica has enabled Suunto to provide a new level of value that feeds the competitive nature of Suunto’s customers. HPE Confidential 12 28 November 2017 “Vertica helps provides analytics so that athletes can train better and achieve more.”

Checklist for preparing for IoT
Open up your systems (not just open source) Reconsider expensive legacy Solutions that scale Consider differing analytical workloads Skills to leverage new tools

Think outside the box - New York Genome
Develop algorithms to find molecular cause of diseases Deal with errors in DNA sequencing Share results to community of scientists Compare tumor DNA to patient’s blood to find variant Precision medicine Suggest drugs to interfere with mutation Specific cancer drugs Arthritis Alzheimer's Parkinson's Asthma Diabetes Autism Cancer G C A T 2-3 years old Collaboration between 17 or so hopitals – so that they wouldn’t all have to start their own genome center Cornell, NYU, Stoneybrook medical, NY stem cell foundation Gene Sequencing Data 3 Billion letters 150 GB per person stored raw data 450 GB with analytics Cancer genome – just under a TB Compare to Reference Gene

Thank you Steve.Sarsfield@hpe.com Community Edition my.vertica.com
11/28/2017 3:17 AM Community Edition my.vertica.com Thank you Presenter Name

Steve Vertica - Hewlett Packard Enterprise

Similar presentations

Presentation on theme: "Steve Vertica - Hewlett Packard Enterprise"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Steve Vertica - Hewlett Packard Enterprise

Similar presentations

Presentation on theme: "Steve Vertica - Hewlett Packard Enterprise"— Presentation transcript:

Similar presentations

About project

Feedback