Databricks What is Databricks ? Cloud services used Functionality Languages Spark Usage 3 rd Party Apps Architecture Books
Databricks – What is it ? A Cloud based Apache Spark cluster service Offers scalable Spark clusters based on AWS Developed by the same people who created Spark Multiple cluster management Job scheduling and library import Offers access to all Spark modules
Databricks – Cloud Services Currently uses Amazon AWS Uses EC2 and has access to S3 buckets Uses a minimum of 2 EC2 instances Attempts to optimise EC2 usage Plans to extend to other cloud providers
Databricks – Functionality Architecture based on Notebooks and folders Has a cluster manager for Defined (min 54gb) clusters Spot clusters On Demand clusters Has a job manager and scheduler Has user management Has full Spark functionality Has strong data visualisation capability Can export reports and dashboards
Databricks – Languages Can have Notebooks in Scala Python SQL SQL can be executed in non SQL Notebooks Markdown comments can be placed in Notebooks Notebooks can be shared by multiple sessions Libraries can be imported and called in Notebooks
Databricks – Spark Usage Lastest Spark version available i.e. DB uses Spark at June 2015 All Spark modules available SQL, GraphX, MlLib, Streaming Strong integration between modules and visualisation Extensive use of tables to import data Tables available via SQL
Databricks – 3 rd Party Apps Current available and more to come Pentaho Qlik Tableau TIBC Jaspersoft PanTera ZoomData
Databricks – Architecture
Available Books See our Hadoop book from Apress / Springer “Big Data Made Easy” Look out for our Apache Spark based book from Packt in 2015
Contact Us Feel free to contact us at We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems