Download presentation
Presentation is loading. Please wait.
1
Vertica Core Technical Overview
2
Analytics architectures – Right tool, right job
On-premises RDBMS Analytics are more stable and more performant. Security Daily analytics Cloud-based solutions Easy to spin up and manage temporary workloads Special workloads Data lake & discovery Hadoop Storage and licensing is cheap. Not all data needs to have fast, advanced analytics
3
The appeal of Vertica Requirement Proof Extreme Optimization
Columnar design for high performance analytics Aggressive compression Scalable to petabyte scale Ready for your Enterprise SQL compliant to 100% of the TPC-DS benchmark queries Secure and ACID compliant No single point of failure Open and Compatible Open platform – Standards compliant SQL, Python, Java Working with open source community Total Cost of Ownership Simply and predictable pricing No penalty for additional hardware or connected users
4
The HPE Vertica portfolio
All built on the same trusted and proven HPE Vertica Core SQL Engine HPE Vertica in the Cloud Get up and running quickly in the cloud Flexible, enterprise-class cloud deployment options HPE Vertica Enterprise Columnar storage and advanced compression Maximum performance and scalability Flex Tables for schema-on-read Core HPE Vertica SQL Engine Advanced Analytics Open ANSI SQL Standards ++ R, Python, Java, Spark. Scala HPE Vertica for SQL on Apache® Hadoop Native support for ORC, more Support for industry-leading distributions No helper node or single point of failure The HPE Vertica Portfolio Regardless of how our customers want to consume and deploy Vertica, we have them covered. Most importantly, the entire Vertica Portfolio is based on the same, trusted, field-proven Vertica SQL engine and rich analytical functionality. So, whether customers need to access Big Data analytics via the cloud either as SaaS or run on select Amazon hardware, on-premise, or co-located Hadoop, no one provides the breadth of functionality and consumption models as HPE Vertica!
5
HPE Vertica Core SQL engine
Core capabilities
6
Core capabilities – Built for speed
We boost performance What 1000% means: Use to take Now takes 1 hour 3.6 Seconds 8 hours (overnight) Under 30 seconds “When we did the first queries, they were done so fast, we thought they were broken.” - Michael Relich, Guess? Slides that follow describe how we achieve such huge performance increases
7
Secrets to achieving performance increases
Columnar Storage Compression MPP Scale-Out Distributed Query Projections Speeds Query Time by Reading Only Necessary Data Lowers costly I/O to boost overall performance Provides high scalability on clusters with no name node or other single point of failure Any node can initiate the queries and use other nodes for work. No single point of failure Combine high availability with special optimizations for query performance Memory CPU Disk A B D C E
8
Row vs Columnar Storage
Row Storage Columnar Storage Traditional Database Storage Method Requires all data be read on query Limited compression possible HP Vertica Database Storage Method Speeds Query Time by Reading Only Necessary Data Ready for Compression Oracle didn’t built this feature from the ground up. We did
9
Example SELECT SUM(volume) FROM trades WHERE symbol = 'HPQ' AND date = '5/12/2014' Symbol AAPL GOOG HPQ IBM Date 5/12/2014 Time 13:01:07 13:01:09 13:01:10 13:01:11 13:01:13 13:01:14 Price 530.25 32.09 187.84 Volume 20 5 150 1021 50 1230 122 Type Market Limit Buy Stop
10
Compression Compression lowers costly I/O to boost overall performance
Symbol AAPL Compression lowers costly I/O to boost overall performance Vertica uses lossless compression integer packing for unencoded integers Lempel–Ziv–Oberhumer for compressible data Fast and distributable algorithms – splittable across MPP ªç]AAPL Parquet and ORC files wish they had the compression we do.
11
Clustering / MPP / Scale-Out
Parallel design Enables distributed storage and workload with active redundancy Automatic replication, failover and recovery Shared-nothing database architecture Provides high scalability on clusters No name node or other single point of failure Add nodes to achieve optimal capacity and performance Lower data center costs, higher density, scale-out Memory CPU Disk Hadoop also has a scale out architecture.
12
Distributed query execution
Client connects to a node and issues a query Node the client is connected to becomes the initiator node Other nodes in the cluster become executor nodes Initiator node parses the query and picks an execution plan Initiator node distributes query plan to executor nodes Initiator node aggregates results from all nodes Initiator node returns final result to the user Nodes are Peers Any node can be the initiator No name node or single point of failure Query/Load to any node Continuous/ real-time load and query Memory CPU Disk INITIATOR Memory CPU Disk EXECUTOR Memory CPU Disk EXECUTOR Competitors have helper nodes and name nodes. We don’t
13
Live Aggregate Projections
B C D E … Projections Live Aggregate Projections A B D C E A B D A B D SORT ORDER MOST RECENT SELECT a,b,d FROM projection_name ORDER BY a; SELECT SUM(A), MIN(B), MAX(D) FROM projection_name
14
Query optimization comparison
Vertica Projections Are primary storage – no base tables are required Can be segmented, partitioned, sorted, compressed and encoded to suit your needs Have a simple physical design Are efficient to load & maintain Are versatile – they can support any data model Allow you to work with the detailed data Provide near-real time low data latency Combine high availability with special optimizations for query performance Traditional Materialized Views Are secondary storage Are rigid: Practically limited to columns and query needs, more columns = more I/O Are mostly batch updated Provide high data latency Traditional Indexes Are secondary storage pointing to base table data Support one clustered index at most – tough to scale out Require complex design choices Are expensive to update Provide high data latency
15
Solutions and applications
Core Security & Manageability Flexible Deployment Data Sources & Formats Solutions and Applications Advanced Analytics
16
HP Vertica Pulse Perform sentiment analysis Solution Challenge
Performing sentiment analysis is time-consuming and tedious HP Vertica Pulse Entity extraction, sentiment HP Vertica Place Vertica Analytics Platform Solution Social Media Connector HP Vertica Pulse Scalable, in-database entity extraction and sentiment analysis Aggregate and drill-down views Easy to get started Social Hadoop Machine
17
HP Vertica Distributed R
R-based analytics Memory CPU Disk R Challenge Customers want to use R for analytics. However, R scalability is always a question Algorithm Use cases Linear Regression (GLM) Risk Analysis, Trend Analysis, etc. Logistic Regression (GLM) Customer Response modeling, Healthcare analytics (Disease analysis) Random Forest Customer churn, Market campaign analysis K-Means Clustering Customer segmentation, Fraud detection, Anomaly detection Page Rank Identify influencers Solution HP Distributed R Analyze data sets too large for standard R Perform complex analyses much more quickly (20x faster than Hadoop) Use familiar R environment to explore data, develop, and execute algorithms Operate on full data set (no down sampling)
18
data with understanding of geometry and/or geography
Vertica Place Geospatial Analysis Challenge Analysis of data with understanding of geometry and/or geography Solution HP Vertica Place Optimized Spatial Joins with memory-resident Geospatial Indexing replaces expensive scans with easy look-ups Easy to use OGC-standard based implementation with spatial functions to compute: distance, intersections Simple integration with third party applications SELECT STV_Intersect(gid, geom USING PARAMETERS index=‘/dat/states.idx’) OVER() AS (call_gid, state_gid) FROM calls;
19
Security & Manageability Solutions and Applications
Advanced analytics Core Security & Manageability Flexible Deployment Data Sources & Formats Solutions and Applications Advanced Analytics
20
Business impact of advanced analytics
Advanced analytical functions speed development Event-based Windows Monte Carlo Time Series Analysis Regression Testing Geospatial/Place Social Media/Pulse Much more What analytical functions mean: Use to take Now takes Hundreds of lines of code Fewer lines of code SELECT CustomerName,City FROM Customers; INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country) VALUES ('Cardinal','Tom B. Erichsen','Skagenzzzzz 21',‘Cambridge',‘02140',USA'); INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country) VALUES ('Cardinal',’Massachusetts’ '); SELECT CustomerName,State FROM Customers; Other companies don’t have all the analysis we have. It goes beyond the TPC-DS benchmarks
21
From hindsight to insight to foresight
Descriptive Diagnostic Predictive Prescriptive Pre-emptive What is the attrition rate for the last 6 months? Which customers have I lost? Why has the attrition rate increased? Which customers are the most likely to attrite if I don’t contact them? Who will if I contact them? Which customers should I target to maintain? What if…? What can I offer before the customer realizes the need? Value add? What Happened? Why did it Happen? What will Happen? What should I do? What more Can I do? INFORMATION INSIGHTS DECISION ACTION
22
Shortcomings of Big Data analytical solutions
Item Description Example use case WHERE clause subqueries Fast calculations on data that you retrieve using SQL Windowing functions Group clickstream data into sessions to analyze a web visitor’s browsing behavior. Calculate moving-average for stock ticker data. Data Types Defining attributes with data types, such as money, time, and JSON Validating Date and Time Values Calculating Time Difference Understanding which monetary units were used for a transaction Understanding partial monetary units, like cents, from whole monetary units. For example, 3.15 specifies 3 dollars and 15 cents. MERGE-JOIN Joining data from separate tables Looking up vendor information from a vendor table – ERP and CRM mash up Mashing up CRM data with sentiment analysis Leveraging reference data Many, many uses Geospatial Functions Analysis of data with understanding of geometry and/or geography Understanding if an object’s location is inside or outside a zone/circle/area Latitude and longitude applications Sentiment Scoring Functions Parsing social media data to understand its relationship to negative/positive words and phrases Acquiring data from twitter and performing sentiment analysis
23
Analytical features of Vertica
Vertica SQL Standard SQL-99 Conventions Vertica Extended-SQL Advanced Analytics with SQL Vertica Innovations Advanced Analytics using Custom Logic Vertica User Defined Extensions Aggregate Sessionization Regression Testing Analytics C++ Java R Connection ODBC/JDBC HIVE Hadoop Flex Zone Analytical Time Series Time slice Interpolation (Constant & Linear) Gap Filling Statistical Modeling Window Functions Event-based Windows Conditional Change Event Conditional True Event Classification Algorithms Graph Event Series Joins Page Rank Monte Carlo Social Media/Pulse Text Mining Patterns/Trends Text-mining Geospatial Pattern Matching Match, Define, Pattern Keywords Funnel Analysis Geospatial (Place) Statistical
24
HP Vertica Enterprise – Proven
Advanced ANSI SQL Analytics Massively Parallel Processing Column Orientation Application Integration Highly Available Automatic DB Designer Advanced Compression Management Console Commodity Hardware (x86) ANSI SQL ENGINE Columnar Formats (ROS, Flex) File System (EXT 4) Commodity Hardware (x86) ANSI SQL ENGINE Columnar Formats (ROS, Flex) File System (EXT 4) Commodity Hardware (x86) ANSI SQL ENGINE Columnar Formats (ROS, Flex) File System (EXT 4) Commodity Hardware (x86) ANSI SQL ENGINE Columnar Formats (ROS, Flex) File System (EXT 4) Same core engine as HP Vertica with Hadoop as the data storage layer Perform analytics regardless of the format of data or Hadoop distribution used Robust, enterprise-ready solution with world-class enterprise support and services Open APIs and developer tools with a vibrant ecosystem of partners to support your big data project Ease management of big data – solution is part of a greater HP Enterprise Software Platform – Haven
25
Data ingestion and storage
Core Security & Manageability Flexible Deployment Data Sources & Formats Solutions and Applications Advanced Analytics
26
On-premise data access
Streaming Kafka, Trickle (Insert/Update) Schema on read Flex Zone: JSON, CSV, TEXT, Social Media Batch ODBC/JDBC, Bulk COPY, LCOP, ETL: Pentaho, Attunity, Informatica, Talend, ET AL Unstructured IDOL: Video, Audio, Voice Recognition, Facial Recognitiion Hadoop ORC Reader, MapR NFS, HIVE Serializer: HDFS, Parquet, AVRO Vertica Cluster
27
HPE Vertica for SQL on Apache® Hadoop features and benefits
Query data, no matter where it is located Install HP Vertica directly on your Hadoop infrastructure ORC, Parquet, Avro, Vertica ROS and JSON supported Full-functionality ANSI SQL 100% of TPC-DS queries No helper node or single point of failure Competitive price point Analytical Applications R Java Python SQL HPE Vertica Core Engine Store: ROS Ingest: AVRO, JSON, etc. Query: ORC & Parquet Same core engine as HP Vertica with Hadoop as the data storage layer Perform analytics regardless of the format of data or Hadoop distribution used Robust, enterprise-ready solution with world-class enterprise support and services Open APIs and developer tools with a vibrant ecosystem of partners to support your big data project Ease management of big data – solution is part of a greater HP Enterprise Software Platform – Haven
28
HP Vertica for SQL on Apache® Hadoop
Same Vertica MPP Columnar Architecture Base ANSI SQL Co-Located with Hadoop Data Query Across parquet, ORC, JSON, and many other format Hadoop Agnostic Commodity Hardware (x86) ANSI SQL ENGINE Open Formats + ROS/Flex HDFS Commodity Hardware (x86) ANSI SQL ENGINE Open Formats + ROS/Flex HDFS Commodity Hardware (x86) ANSI SQL ENGINE Open Formats + ROS/Flex HDFS Commodity Hardware (x86) ANSI SQL ENGINE Open Formats + ROS/Flex HDFS Same core engine as HP Vertica with Hadoop as the data storage layer Perform analytics regardless of the format of data or Hadoop distribution used Robust, enterprise-ready solution with world-class enterprise support and services Open APIs and developer tools with a vibrant ecosystem of partners to support your big data project Ease management of big data – solution is part of a greater HP Enterprise Software Platform – Haven
29
New ORCFile reader Before After Hcat Connector ORC Reader
Open Source Project jointly developed by Vertica and Hortonworks Engineers Query close to where data resides Column Pruning, Predicate Pushdown Hadoop Data Nodes Hadoop Data Nodes VSQLH SP1 VSQLH SP2 Before After Hcat Connector SerDes WebHCat ORC Reader External Table WebHDFS HDFS ORCFiles HDFS ORCFiles The ORCReader project as I mentioned before is Open Source on GitHub. Interestingly we have had contributors from Microsoft, but the majority of the code is from HWX and Vertica. The team in Pittsburg is happy to announce this project being pushed upstream into the main Apache Hadoop project. The ORCReader is a NEW way to directly access ORCFile formatted data and replaces the initial access method used via Hcat Connector. Instead of using SerDes, the ORCReader uses External Table mechanism to access ORC data.
30
HPE Vertica for SQL on Apache® Hadoop users
Analysts no longer need to care where the data is located or how it is stored. They can use favorite BI/data visualization tools. HPE Vertica ANSI SQL Fastest DBAs now have a single way to access data whether it is in Hadoop or Vertica and implements complete Information Life Cycle Management. HP Vertica Optimized Storage Fast Hadoop Storage Data engineers can now use SQL in addition to MapReduce, Hive, etc. to explore and analyze the data. HPE Veritica for SQL on Hadoop Users Analysts are most interested in using their preferred BI/data visualization tools to provide analytical insight to the organization in the form of dashboards and reports. They are less concerned with where the data stored, provided that they can actually query the data and visualize the results. DBAs are under duress in handling the various workloads and requests from the organization. Using Hadoop for less-demanding, less-intensive analytical requests to tier off colder data and tapping into the rich, high-performance analytics on data stored in the ROS format helps them to address the full and varying needs of internal and external users. Data Engineers can ingest all emerging forms of data, including AVRO and JSON, from sensor data and log files for increasingly popular analytical use cases. For top performance, these data engineers can store data in the highly optimized Vertica ROS format. If their organization is looking to derive value from the existing data stored in Hadoop data lakes – such as ORC and Parquet – HPE Vertica for SQL on Hadoop can fit that need, too.
31
Data storage options and performance
Query Engine Vertica ANSI SQL-99 HPE Vertica SQL on Hadoop Vertica ANSI SQL-99 Vertica ANSI SQL-99 Vertica ANSI SQL-99 Vertica ANSI SQL-99 Format Vertica ROS Vertica ROS Hadoop Format Flex Tables Flat Files File System EXT4 HDFS HDFS HDFS HDFS Fastest Analytics Structured Slowest Discovery Semi-Structured Performance
32
Hadoop agnostic Powerful SQL on Hadoop for any Hadoop distribution
Works with: Cloudera Distribution Including Apache Hadoop (CDH) 4.6, 5.0, and 5.0.1 Hortonworks Data Platform (HDP) 2.1 MapR and 3.0.3 Also: MapR NFS may be used as storage Versions update frequently. Check online documentation for a detailed description and updates to compatibility information.
33
Security & manageability
Core Security & Manageability Flexible Deployment Data Sources & Formats Solutions and Applications Advanced Analytics
34
Security SSL security protocol to authenticate, encrypt and verify access Use SQL to GRANT/REVOKE to database, schema, table, views, insert/update/delete, etc. Manage Users in the Vertica Management Console or use LDAP/Kerberos for external management Security Authentication Authorization Encryption
35
Database designer Leverage Typical Queries Sample Data/Schemas
Historical Statistics and Logs Optimize Query Performance Data Loading Storage Footprint Benefit Faster Queries Lower Hardware Costs Shortened design time Lower costs to maintain and optimize
36
Workload management Problem: Rogue queries can take over
Affect loads or tactical queries Steal Resources from Daily Analytics Disrupt Business Processes Solution: Workload Management Reserve resources for high-priority queries Apply run-time prioritization to manage CPU and I/O Take control of rogue queries SELECT "Physical Exam".ParticipantId, "Physical Exam".SequenceNum, "Physical Exam".Date, "Physical Exam".Day, "Physical Exam".Weight_kg, "Physical Exam".Temp_C, "Physical Exam".SystolicBloodPressure, "Physical Exam".DiastolicBloodPressure, "Physical Exam".Pulse, "Physical Exam".Respirations, "Physical Exam".Signature, "Physical Exam".Pregnancy, "Physical Exam".Language, AverageTempPerParticipant.AverageTemp, FROM "Physical Exam" INNER JOIN AverageTempPerParticipant ON "Physical Exam".ParticipantID=AverageTempPerParticipant.ParticipantID
37
Security & Manageability Solutions and Applications
Deployment & services Core Security & Manageability Flexible Deployment Data Sources & Formats Solutions and Applications Advanced Analytics
38
Vertica customer experience
Intelligent approach to implementation, support, and expansion Professional Services Architecture and implementation best practices to maximize value Training Implementation assistance Health check audits Technical Support Deep product expertise to provide world class support 24x7 Critical issue support Deep technical expertise Case management Product updates HPE Big Data Community Tools to engage with the Vertica community MyVertica user forum Web-based training Scripts and code sharing Customer / Technical Account Management Proactive relationship management to ensure your success with Vertica Account review Issues escalation Executive engagement Product management Solution Architects / Information Development / Partner Engineering Best practices, complete solutions Clear concise technical documentation Partner relationships and tools
39
Vertica professional services
Targeted Vertica implementation expertise and guidance Focus Areas Training Implementation Assessment What we provide Expert Vertica assistance On site project team mentoring Vertica platform implementation and best practices advice Public, private, and free resources System Administrators DBAs Application Developers Online and in Person Vertica proactive project assistance SW installation Data loading Query performance tuning Go Live support Continuity planning Cluster maintenance Vertica Health Check On site expert system review and data collection Detailed audit report of findings Implementation of recommendations (optional)
40
Vertica makes data matter
Purpose built for Big Data from the first line of code Gain insight into your data 50x-1,000x faster than legacy products Real time analytics Infinitely scale your solution by adding an unlimited number of low cost nodes Massive scalability Built-in support for Hadoop, R, and a range of ETL and BI tools Open architecture Store 10x-30x more data per server than row databases with patented columnar compression Optimized data storage On-premises Private Cloud Public Cloud Hadoop Deploys to: Vertica Analytics Platform Overview The HP Vertica Analytics Platform solves real-world Big Data challenges. It is purpose-built for organizations of all sizes to monetize data at hyper-speed and massive scale needed to differentiate in today’s competitive economic climate. The HP Vertica Analytics Platform delivers: Blazing Fast Analytics - Gain insights into your data in near-real time by running queries 50x-1,000x faster than legacy products Massive Scalability - Infinitely scale your solution by adding an unlimited number of industry-standard servers Open Architecture - Protect and embrace your investment in hardware and software, with built-in support for Hadoop, R, and a range of ETL and BI tools Optimized Data Storage - Store 10x-30x more data per server than row databases with patented columnar compression Deployment is up to you … What can the HP Vertica Analytics Platform do for you? The HP Vertica Analytics Platform is truly built for analytics with technology born of the modern age. It is not a back-end legacy database, nor does it merely store your data. The HP Vertica Analytics Platform enables you to have a conversation with your data in ultimately finding the answers you need to monetize Big Data.
41
Thank you Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.