Download presentation
Presentation is loading. Please wait.
1
H2O is used by more than 14,000 companies
(and nearly half of the Fortune 500 companies) in mission-critical use cases for Finance, Insurance, Healthcare, Retail, Telco, Sales, and Marketing. H2O.ai (formerly 0xdata - pronounced hexadata) is a private company from Mountain View, CA. Founded in 2011, ~ 100 employees. Main product - open source ML framework for big data (on Hadoop cluster) called "H2O" : - written in Java (with some R & Python) - runs on Hadoop/Spark (or standalone) - in-memory, distributed, fast, and scalable - runs on AWS, Google Cloud, Microsoft Azure, but can also run on your laptop AutoML and "Driverless AI" - uses thousands of models to select the best model (or ensemble) and hyperparameters.
2
SriSatish Ammbati, CEO and co-founder of 0xdata Past: Platfora, Cassandra, DataStax, Azul Systems, UC Berkeley Cliff Click, CTO and co-founder of 0xdata Past: Azul Systems, Sun Microsystems, Developed the Java HotSpot Server Compiler, PhD in CS from Rice University Ammbati : “H2O brings fast, scalable machine learning to build the smarter applications that will power the internet of the future. H2O rewrote fundamental analytical algorithms from math, statistics and machine learning to be fast and on big datasets… H2O was built from the ground up to work as a mathematical engine that can be distributed over a large number of servers. It does this by breaking apart the data that is stored into multiple smaller and compressed pieces that are too large to fit into the memory of any one machine. Data scientists interact with H2O to conduct statistics and machine learning with pre-built models, standard tools for conducting end-to-end analysis and presenting the results in a simple to use UI or standard tools like R, Tableau, Excel, and our web-based application.”
3
H2O supports the following distributed algorithms:
- GLM, - Naive Bayes, - Distributed Random Forest, - Gradient Boosting Machine (GBM), - Deep Neural Networks, - Deep learning, - K-means , - PCA, - Generalized Low Rank Models, - Anomaly Detection, - Autoencoders You can use H2O from python (from Jupyter notebook, from SageMaker, etc.) import h2o h2o.init() from h2o.estimators.naive_bayes import H2ONaiveBayesEstimator data = h2o.import_file("myfile.csv") train,valid = data.split_frame([0.8]) predictors = ['col1','col2','col3','col4'] result = 'col_result' model = H2ONaiveBayesEstimator() model.train(predictors,result,training_frame=train,validation_frame=valid) model.model_performance()
4
H2O FLOW – browser interface for H2O.
Code, text, math, plot, capture, rerun, annotate, present, and share your workflow.
5
AWS Marketplace - search for H2O
Can create H2O Cluster of VMs, or can be used with EMR (Amazon Elastic MapReduce). H2O Sparkling Water is software to run H2O on Spark. Can be used with Zeppelin notebooks on EMR.
6
AutoML & Driverless AI AutoML - uses combinations of different models to find the best model(s) and the best ML ensemble model solution for your data. Uses knowledge based on testing models on datasets from Kaggle and other sources. Automatically fine-tunes hyperparameters. Driverless AI – Enterprise version of AutoML. Cost ~$10/hr. Same as AutoML, but also works on preparing your data. Uses PCA and other methods to extract/build features for models. Uses GPUs for speed.
7
H2O vs SageMaker H2O has higher ratings (in comparison with SageMaker). - You can invoke models on H2O cluster from python code in SageMaker. You can specify in your configuration where your H2O java server is running. It may be local – or remote. import h2o h2o.init() # this line connects to (or starts local) H2O server. ... Here is example of how to do training and inference in SageMaker using H2o's automatic machine learning (AutoML): -
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.