Databricks What is Databricks ? Cloud services used Functionality Languages Spark Usage 3 rd Party Apps Architecture Books

Slides:



Advertisements
Similar presentations
Suggested Course Outline Cloud Computing Bahga & Madisetti, © 2014Book website:
Advertisements

Making Fly Parviz Deyhim
Esri UC 2014 | Technical Workshop | Automating Cache Workflows and Tile Usage Heat Maps Eric J. Rodenberg.
1 Feel free to contact us at
Hadoop Ecosystem Overview
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Manage & Configure SQL Database on the Cloud Haishi Bai Technical Evangelist Microsoft.
Scientific Computing at Amazon Disruptive Innovations in Distributed Computing Dave Ward, Principal Product Manager Adam Gray, Senior Product Manager.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.
DELIVERING THE ENTERPRISE FABRIC FOR BIG DATA Aiaz Kazi SVP, Platform Strategy and Adoption
AWS Amazon Web Services Georges Akpoly CS252. Overview of AWS Amazon Elastic Compute Cloud (EC2) Amazon Simple Storage Service (S3) Amazon Simple Queue.
Enterprise Cloud Computing
Matthew Winter and Ned Shawa
Data-Intensive Cloud Control for GENI Cluster D Session July 20 th, 2010.
Information Systems in Organizations 5.2 Cloud Computing.
Spark and Jupyter 1 IT - Analytics Working Group - Luca Menichetti.
Introduction to SQL Server 2000 Reporting Services Jeff Dumas Technical Specialist Microsoft Corporation
Apache Mesos What is it ? Beyond Hadoop Resource Sharing Mesos Intentions Architecture Users
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
Apache Tinkerpop What is Tinkerpop ? What can it do ? Why am I interested ? Uses Gremlin Implementations Define Graphs Traverse Graphs Architecture Books.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Apache Spark Osman AIDEL.
Handling Streaming Data in Spotify Using the Cloud
ORNL is managed by UT-Battelle for the US Department of Energy Spark On Demand Deploying on Rhea Dale Stansberry John Harney Advanced Data and Workflows.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Apache Titan What is Titan ? Graph Storage Uses Tinkerpop CAP Theorum Architecture Books
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
data & analytics beyond dashboards
Big Data is a Big Deal!.
Security Group Amazon RDS Mysql Media Request S3
Application area Events Conferences Exhibitions
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Machine Learning Library for Apache Ignite
ITCS-3190.
Metis Data Science Meetup:
Build interactive data analysis environments using Apache Spark
Microsoft Machine Learning & Data Science Summit
Status and Challenges: January 2017
Hadoop Tutorials Spark
Spark Presentation.
Azure Functions and Automation: The SQL Agent in the Cloud
Tools and Services Workshop Overview of Atmosphere
Building Analytics At Scale With USQL and C#
Data Platform and Analytics Foundational Training
Big Data Machine Learning using Apache Spark MLlib
Introduction to Spark.
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Near Real Time ETLs with Azure Serverless Architecture
Overview of big data tools
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Databricks: the new kid on the block
Spark and Scala.
Department of Intelligent Systems Engineering
Building Serverless Enterprise Applications
Enol Fernandez & Giuseppe La Rocca EGI Foundation
Databricks and End-to-End Processes Demo Links & Help
Thank you to our Sponsors
Apache Oozie What is it ? Why use it ? Architecture Examples
Data Wrangling for ETL enthusiasts
Beyond orchestration with Azure Data Factory
Visual Data Flows – Azure Data Factory v2
Dimension Load Patterns with Azure Data Factory Data Flows
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Spark with R Martijn Tennekes
Presentation transcript:

Databricks What is Databricks ? Cloud services used Functionality Languages Spark Usage 3 rd Party Apps Architecture Books

Databricks – What is it ? A Cloud based Apache Spark cluster service Offers scalable Spark clusters based on AWS Developed by the same people who created Spark Multiple cluster management Job scheduling and library import Offers access to all Spark modules

Databricks – Cloud Services Currently uses Amazon AWS Uses EC2 and has access to S3 buckets Uses a minimum of 2 EC2 instances Attempts to optimise EC2 usage Plans to extend to other cloud providers

Databricks – Functionality Architecture based on Notebooks and folders Has a cluster manager for  Defined (min 54gb) clusters  Spot clusters  On Demand clusters Has a job manager and scheduler Has user management Has full Spark functionality Has strong data visualisation capability Can export reports and dashboards

Databricks – Languages Can have Notebooks in  Scala  Python  SQL SQL can be executed in non SQL Notebooks Markdown comments can be placed in Notebooks Notebooks can be shared by multiple sessions Libraries can be imported and called in Notebooks

Databricks – Spark Usage Lastest Spark version available  i.e. DB uses Spark at June 2015 All Spark modules available  SQL, GraphX, MlLib, Streaming Strong integration between modules and visualisation Extensive use of tables to import data Tables available via SQL

Databricks – 3 rd Party Apps Current available and more to come  Pentaho  Qlik  Tableau  TIBC Jaspersoft  PanTera  ZoomData

Databricks – Architecture

Available Books See our Hadoop book from Apress / Springer  “Big Data Made Easy” Look out for our Apache Spark based book  from Packt in 2015

Contact Us Feel free to contact us at   We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems