H2O is used by more than 14,000 companies

Slides:



Advertisements
Similar presentations
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Advertisements

Amazon. Cloud computing also known as on-demand computing or utility computing. Similar to other utility providers like electric, water, and natural gas,
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Clementine Server Clementine Server A data mining software for business solution.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Apache Spark and the future of big data applications Eric Baldeschwieler.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Introduction to Programming Peggy Batchelor.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
An Introduction to HDInsight June 27 th,
Understanding the field & setting expectations.  Personal  International  UNT Alumni (Mathematics)  Academic  Economics & Mathematics  Professional.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Machine Learning as a Service
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
Web Technologies Lecture 13 Introduction to cloud computing.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
1 Seattle University Master’s of Science in Business Analytics Key skills, learning outcomes, and a sample of jobs to apply for, or aim to qualify for,
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
1 Cloud computing application for water resources based on open source software and open standards – a prototype Blagoj Delipetrev Faculty of Computer.
Distributed GLM Tom Kraljevic January 27, 2015 Atlanta, Polygon.
Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Cloud Analytics Platforms Christian Frey. About AIDA Our mission is to advance knowledge in data analytics through research, education and outreach Our.
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
Usman Roshan Dept. of Computer Science NJIT
PrinterOn Mobile Printing Platform
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Connected Infrastructure
Big data toolbox.
4/18/2018 3:49 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Big Data is a Big Deal!.
Predicting Azure Consumption using Ensemble Learning
Big Data A Quick Review on Analytical Tools
Microsoft Machine Learning & Data Science Summit
Working With Azure Batch AI
Make Predictions Using Azure Machine Learning Studio
Introduction to Generalised Low-Rank Model and Missing Values
THE BUSINESS CASE FOR AI, SPARK & MORE
Spark Presentation.
Connected Infrastructure
Hadoop Clusters Tess Fulkerson.
Scalable Machine Learning For Smarter Applications
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
Comparison Between Deep Learning Packages
Python Classes in Pune |
Introduction to Spark.
Oscar AP by Massive Analytic: A Precognitive Analytics Platform for Effortless Data-Driven Decisions. Now Available in Azure Marketplace MICROSOFT AZURE.
Dane Stubben QuintilesIMS Database Manager
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Open Source Toolkit for Turn-Key AI Cluster (Introduction)
Accelerate Your Self-Service Data Analytics
Neural Networks in RStudio with the H2O module
CS110: Discussion about Spark
Data science and machine learning at scale, powered by Jupyter
Big Data Young Lee BUS 550.
Zoie Barrett and Brian Lam
Azure Machine Learning on Databricks
Big-Data Analytics with Azure HDInsight
Welcome to 2019 SQL Saturday in Los Angeles (#891)
Distributed Machine Learning with H2O
Machine Learning for Cyber
Presentation transcript:

H2O is used by more than 14,000 companies (and nearly half of the Fortune 500 companies) in mission-critical use cases for Finance, Insurance, Healthcare, Retail, Telco, Sales, and Marketing. H2O.ai (formerly 0xdata - pronounced hexadata) is a private company from Mountain View, CA. Founded in 2011, ~ 100 employees. Main product - open source ML framework for big data (on Hadoop cluster) called "H2O" : - written in Java (with some R & Python) - runs on Hadoop/Spark (or standalone) - in-memory, distributed, fast, and scalable - runs on AWS, Google Cloud, Microsoft Azure, but can also run on your laptop AutoML and "Driverless AI" - uses thousands of models to select the best model (or ensemble) and hyperparameters.

SriSatish Ammbati, CEO and co-founder of 0xdata Past: Platfora, Cassandra, DataStax, Azul Systems, UC Berkeley Cliff Click, CTO and co-founder of 0xdata Past: Azul Systems, Sun Microsystems, Developed the Java HotSpot Server Compiler, PhD in CS from Rice University Ammbati : “H2O brings fast, scalable machine learning to build the smarter applications that will power the internet of the future. H2O rewrote fundamental analytical algorithms from math, statistics and machine learning to be fast and on big datasets… H2O was built from the ground up to work as a mathematical engine that can be distributed over a large number of servers. It does this by breaking apart the data that is stored into multiple smaller and compressed pieces that are too large to fit into the memory of any one machine. Data scientists interact with H2O to conduct statistics and machine learning with pre-built models, standard tools for conducting end-to-end analysis and presenting the results in a simple to use UI or standard tools like R, Tableau, Excel, and our web-based application.”

H2O supports the following distributed algorithms: - GLM, - Naive Bayes, - Distributed Random Forest, - Gradient Boosting Machine (GBM), - Deep Neural Networks, - Deep learning, - K-means , - PCA, - Generalized Low Rank Models, - Anomaly Detection, - Autoencoders You can use H2O from python (from Jupyter notebook, from SageMaker, etc.) import h2o h2o.init() from h2o.estimators.naive_bayes import H2ONaiveBayesEstimator data = h2o.import_file("myfile.csv") train,valid = data.split_frame([0.8]) predictors = ['col1','col2','col3','col4'] result = 'col_result' model = H2ONaiveBayesEstimator() model.train(predictors,result,training_frame=train,validation_frame=valid) model.model_performance()

H2O FLOW – browser interface for H2O. Code, text, math, plot, capture, rerun, annotate, present, and share your workflow.

AWS Marketplace - search for H2O Can create H2O Cluster of VMs, or can be used with EMR (Amazon Elastic MapReduce). H2O Sparkling Water is software to run H2O on Spark. Can be used with Zeppelin notebooks on EMR.

AutoML & Driverless AI AutoML - uses combinations of different models to find the best model(s) and the best ML ensemble model solution for your data. Uses knowledge based on testing models on 1000+ datasets from Kaggle and other sources. Automatically fine-tunes hyperparameters. Driverless AI – Enterprise version of AutoML. Cost ~$10/hr. Same as AutoML, but also works on preparing your data. Uses PCA and other methods to extract/build features for models. Uses GPUs for speed.

H2O vs SageMaker H2O has higher ratings (in comparison with SageMaker). - https://www.gartner.com/reviews/market/data-science-machine-learning-platforms/compare/H2O-ai-vs-amazon-web-services --------------------------------------------------------------------- You can invoke models on H2O cluster from python code in SageMaker. You can specify in your configuration where your H2O java server is running. It may be local – or remote. import h2o h2o.init() # this line connects to (or starts local) H2O server. ... Here is example of how to do training and inference in SageMaker using H2o's automatic machine learning (AutoML): - https://github.com/h2oai/h2o3-sagemaker