Scott Donaldson, Senior Director, FINRA

Slides:



Advertisements
Similar presentations
1/17/20141 Leveraging Cloudbursting To Drive Down IT Costs Eric Burgener Senior Vice President, Product Marketing March 9, 2010.
Advertisements

System Center 2012 R2 Overview
The Business Value of CA Solutions Ovidiu VALEANU Senior Consultant DNA Software – CA Regional Representative.
Cloud Attributes Business Challenges Influence Your IT Solutions Business to IT Conversation Microsoft is Changing too Supporting System Center In House.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | OFSAAAI: Modeling Platform Enterprise R Modeling Platform Gagan Deep Singh Director.
Delivering an Architecture for the Social Enterprise Alpesh Doshi, Fintricity Information Age Social&Mobile Business Conference Tuesday 31st January 2012.
Cloud Interoperability & Standards. Scalability and Fault Tolerance Fault tolerance is the property that enables a system to continue operating properly.
Data-Centric Security and User Access Controls for Hadoop on Microsoft Azure MICROSOFT AZURE APP BUILDER PROFILE: BLUETALON BlueTalon provides data-centric.
Cisco Consulting Services for Application-Centric Cloud Your Company Needs Fast IT Cisco Application-Centric Cloud Can Help.
IDC Says, "Don't Move To The Cloud" Richard Whitehead Director, Intelligent Workload Management August, 2010 Ben Goodman Principal.
Commvault and Nutanix October Changing IT landscape Today’s Challenges Datacenter Complexity Building for Scale Managing disparate solutions.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Protecting a Tsunami of Data in Hadoop
Service Assurance in the Age of Virtualization
Microsoft Dynamics 365 for Operations Roadmap Deployment Scenarios
Hybrid Management and Security
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Avenues International Inc.
Data Platform and Analytics Foundational Training
WorkDiff Mobile, Scenario-Based Collaboration Solution WorkDiff Allows Users to Work Differently While Using Familiar Functions of Microsoft Office 365.
Organizations Are Embracing New Opportunities
Data Platform and Analytics Foundational Training
100% Exam Passing Guarantee & Money Back Assurance
Automate Does Not Always Mean Optimize
Hybrid Management and Security
Microsoft Operations Management Suite Insight and Analytics
Cisco Data Virtualization
A UNIFIED ECOSYSTEM FOR MARKET DATA VISUALIZATION
Letsignit, an Automated Signature Solution for Microsoft Office 365 and Microsoft Exchange, Provides Efficiency in Branding and Customization OFFICE.
Soft1 Open Enterprise Edition Allows Customers to Easily Synchronize Files Using Microsoft Office 365 and Seamlessly Store Any Information in SharePoint.
IWRITER 365 Offers Seamless, Easy-to-Use Solution for Using, Designing, Managing, and Sharing All Your Company Templates in Microsoft Office 365 OFFICE.
How to prepare for the End of License of Windows Server 2012/R2
Smart Org Charts in Microsoft Office 365: Securely Create, Collaborate, Edit, and Share Org Charts in PowerPoint and Online with OrgWeaver Software OFFICE.
Chapter 21: Cloud Computing and Related Security Issues
Chapter 22: Cloud Computing Technology and Security
Cisco’s Intelligent Automation for Cloud
Exploring Azure Event Grid
Enterprise security for big data solutions on Azure HDInsight
SocialBoards Self-Service, Multichannel Support Ticket Notifications in Microsoft Office 365 Groups Help Customer Care Teams to Provide Better Care OFFICE.
Welcome! Power BI User Group (PUG)
Skyhigh Enables Enterprises to Use Productivity Tools of Microsoft Office 365 While Meeting Their Security, Compliance & Governance Requirements Partner.
MetaShare, Powered by Azure, Gives SharePoint a User-Friendly, Intuitive User Interface and Added App Features with No Added Administrative Tasks OFFICE.
With IvSign, Office 365 Users Can Digitally Sign Word Documents in the Cloud from Any Device Without Having to Install Any Digital Certificates OFFICE.
2016 Primeur ©.
Yellowfin: An Azure-Compatible Business Intelligence Platform That Connects People with Their Data for Better Decision Making MICROSOFT AZURE APP BUILDER.
Protect | Transform | Innovate
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
File Manager for Microsoft Office 365, SharePoint, and OneDrive: Extensible Via Custom Connectors in Enterprise Deployments, Ideal for End Users OFFICE.
Extending Your Integration Strategy
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Welcome! Power BI User Group (PUG)
Is your deployment in pants-down mode?
Unitrends Enterprise Backup Solution Offers Backup and Recovery of Data in the Microsoft Azure Cloud for Better Protection of Virtual and Physical Systems.
The Jamespot for Office 365 Application Attaches Business Processes to Docs and Syncs Them to OneDrive to Simplify Collaboration and Sharing OFFICE 365.
Agolo Summarization Platform Integrates with Microsoft OneDrive to Relate Enterprise Cloud Documents with Real-Time News Summaries OFFICE 365 APP BUILDER.
BluVault Provides Secure and Cost-Effective Cloud Endpoint Backup and Recovery Using Power of Microsoft OneDrive Business and Microsoft Azure OFFICE 365.
Collaborative Business Solutions
AdQ is Azure-Powered Pre-Roll Ad Management Software That Improves Pre-Roll Ad Performance, Increases Profits, and Optimizes User Experience MICROSOFT.
Cloud Consulting Services and Solutions
Windows 10 Enterprise subscriptions in CSP – Messaging Summary
Letsignit, an Automated Signature Solution for Microsoft Office 365 and Microsoft Exchange, Provides Efficiency in Branding and Customization OFFICE.
Last.Backend is a Continuous Delivery Platform for Developers and Dev Teams, Allowing Them to Manage and Deploy Applications Easier and Faster MICROSOFT.
Yooba File Sync: A Microsoft Office 365 Add-In That Syncs Sales Content in SharePoint Online to Yooba’s Sales Performance Management Solution OFFICE 365.
Reportin Integrates with Microsoft Office 365 to Provide an End-to-End Platform for Financial Teams That Simplifies Report Creation and Management OFFICE.
Computer Science and Engineering
DBOS DecisionBrain Optimization Server
Protect data in core business applications
Introduction to Azure Data Lake
Presentation transcript:

Self-service, interactive analytics at petabyte scale in capital markets regulation in the cloud Scott Donaldson, Senior Director, FINRA Matt Cardillo, Senior Director, FINRA

WHO IS FINRA? Regulate 99% Equities and 70% of Options activity in the U.S. Up to 75B market events/day Reconstruct the market containing several trillion events Detect & Investigate Identify market manipulations, insider trading, and fraud Ensure rule compliance Enforce & Discipline Fine and bar broker dealers Refer matters to the SEC and other authorities

WHAT WE DO Front Running Insider Trading Best Execution Washed Sales Short Sales Layering Spoofing Order Protection Fraud Compliance Compliance

OUR VIEW ON SELF SERVICE ANALYTICS User Classification General business user Power Users Data Scientists Data Centric Tools in the Line of Business Specialized Templated Exploratory Embedded Emphasize & Increase Data Literacy Encourage innovation & sharing Crowd sourced learning IT hones the analytic to operate at scale

CREATE A DIALOGUE WITH THE DATA Citizen analytics Facilitate data exploration Focus on ease of use Easily share and collaborate Minimize intermediaries between users and the data lake Enable users without having to understand underlying technologies and infrastructure Self Service with guard rails Safeguard information with high degree of security and least privileges access Users need to understand the semantics of the data

SECURITY IS OUR #1 PRIORITY Virtual Private Cloud (VPC) Encryption Required to have encryption of all data at-rest and in-transit Transparent Server Side Encryption for non-PCI/PII KMS encryption for PCI/PII Local disk encryption Encrypt ephemeral storage on all nodes LUKS with random, memory only key via custom bootstrap action It is OK to lose the server and lose the data – S3 is our source of truth

SECURITY IS MORE THAN JUST ENCRYPTION Security must be designed in from the beginning Separation of duties Dev, QC, Prod and DR accounts all separated Use least privileges & no catch all rules DevOps: Automate everything Infrastructure as code Requires a culture shift and retraining of development teams Access rules Data classification controls strictness of rules Centralized monitoring for total transparency

OUR CLOUD ARCHITECTURE

DATA BY DESIGN Focus on use cases Multiple copies often necessary Organize your data for defined usage scenario Experiment to determine optimal file size Avoid small files Separate storage from compute Shutdown your cluster when not in use Share data among multiple clusters Fault tolerance and disaster recovery Hybrid approach may be necessary Columnar with the full row embedded using schema on read

EXAMPLE PARTITION STRATEGY

CENTRALIZE YOUR DATA MANAGEMENT Herd FINRA’s open source data manager http://finraos.github.io/herd/ Platform Services Unified catalog Schemas Versions Encryption Type Storage policies Workflow and Cluster Management Manage EMR clusters Task Orchestration with Activiti, EMR Steps and Oozie Lineage and Usage Track publishers, consumers, and job lineage

SCALED REUSE WITH A SHARED METASTORE Consider creating a shared hive metastore service Offload metastore hydration of tables and partitions from cluster creation Tables with millions of partitions can take hours to initialize Transient clusters initialize faster Avoid duplicative effort by dev teams Register new tables and partitions as the data arrives via notifications Separate metastore instances are needed for Hive 0.13, Hive 1.0 and Presto Benefits: Bring your own compute Separation of concerns for different workloads Optimize cluster configuration for the workload Avoid compromises as you would in a shared database/appliance Fault tolerance & DR with Multi-AZ RDS

RIGHT SIZE YOUR CLUSTER Transient use cases: Batch and Progressive analytics Size cluster based on run times and pricing structure Use Spot and Fixed Duration pricing when you have flexible SLA Always-On use case: Interactive analytics Size Core nodes based on HDFS needs (statistics, logging, etc) Reserve Master and Core nodes Resize # of Task nodes as needed Keep Core to Task ration of 1:5 to avoid bottlenecks Use Spot and Fixed Duration pricing on Task nodes to manage costs

Node distribution for October 22nd ELASTICITY METRICS Node distribution for October 22nd

AD HOC QUERY EXAMPLE Instantaneous access to the data lake Easy data discovery Shared metastore updated automatically from Data Management of all new table & partitions Right tools for the right queries Presto for interactive data slices Hive (M/R and Tez) for complex queries and transforms Share common DDL and ORC files between Hive and Presto Separate concerns through separate clusters Data is always available No load/unload operations

EXPLORATORY ANALYTICS EXAMPLE Audit Trail Information Filtering, Summarization, Pivoting Going beyond desktop tools

SPECIALIZED ANALYTICS EXAMPLE Market Reconstruction Pre-computed Order Lifecycles 3+ trillion nodes & 600 billion edges Hbase schema on-read solution

EMBEDDED ANALYTICS EXAMPLE Embed scalable analytics into the workflow application Provide richer Ad Hoc tools for analysis Integrated with Surveillance Catalog

THE IMPACT Removed obstacles “Before data analysis of this magnitude required intervention from the technology team.” Lowered the cost of curiosity “Analysts are able to quickly obtain a full picture of what happens to an order over time, helping to inform decision making as to whether a rule violation has occurred.” Optimize batch and interactive workloads without compromise Elasticity allows us to process years of data in days rather than months while saving money Increased teams’ delivery velocity

TAKEAWAYS Provide the right tools to the right class of user Empower exploration and sharing Stay central and avoid silos Emphasize and increase data literacy Don’t forget about the UX Collect and analyze metadata, then adapt IT needs to collaborate as a partner

ABOUT THE PRESENTERS Scott Donaldson Matt Cardillo Senior Director, FINRA Data Analytics and Surveillance Systems Scott.Donaldson@finra.org https://www.linkedin.com/in/scottdonaldson Matt Cardillo Senior Director, FINRA Data Analytics Matt.Cardillo@finra.org https://www.linkedin.com/in/matt-cardillo-3977552

Learn more at http://technology.finra.org/ QUESTIONS? Learn more at http://technology.finra.org/ FIRNA Technology is hiring http://technology.finra.org/careers.html