A platform for the Complete Machine Learning Lifecycle Corey Zumar March 27th, 2019
Outline Overview of ML development challenges How MLflow tackles these challenges MLflow components Demo How to get started - Quick overview of MLflow. - Some of the concepts were covered in Matei's keynote talk on Machine Leaning and AI.
Machine Learning Development is Complex
ML Lifecycle Data Prep Raw Data Training Deploy Tuning Scale Tuning Delta μ λ θ Tuning Scale Data Prep Model Exchange μ λ θ Tuning Raw Data Training Scale Scale Deploy Governance Scale 0. ML dev process: raw data -> prep (ETL and feature computation) -> training -> deployment aspects to ML dev cycle that makes it complex Various tools. Best tool for each problem. Multiple tools. Comparison to tradition s/w development. Various steps tuning and optimization: play with longer windows for feature, remember learning rate… Scale: use all the data to improve results Larger teams, isolation => model and data format for exchange. Changes in data storage format -> trickle in ETL changes -> …. Large teams, friction. Governance of models and data: various models in testing. Audit an older model – observe data, retune it???
Custom ML Platforms Can we provide similar benefits in an open manner? Facebook FBLearner, Uber Michelangelo, Google TFX Standardize the data prep / training / deploy loop: if you work with the platform, you get these! Limited to a few algorithms or frameworks Tied to one company’s infrastructure Can we provide similar benefits in an open manner? How are the other big players in Machine Learning scaling out? Invested in platform team POSITIVES : simplification by standardization. A set of tools for data prep, training and deploy. NEGATIVE : limited models. Inflexibility of algorithms. Tied to company’s infra. Hard to open it to other users or open source these systems.
Introducing Open machine learning platform Works with any ML library & language Runs the same way anywhere (e.g. any cloud) Designed to be useful for 1 or 1000+ person orgs Mlflow brings about an innovation – as an open ML platform. Work with any ML library/algorithm, language of choice. Not boxed into limited set of tools - Design philosophy: API first, modular. Submit runs in the same way anywhere. Local machine, any type of cloud Reduce complexity and learning curve. Make it easy for 1 person to use it and scalable to 100K of user in an org.
MLflow Components Tracking Projects Models Standard packaging format for reproducible ML runs Folder of code + data files with a “MLproject” description file MLflow Components Tracking Record and query experiments: code, configs, results, …etc Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools
Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including data and models Source: training code that ran Version: version of the training code Tags and Notes: any additional information
MLflow Tracking Tracking Server Tracking APIs (REST, Python, Java, R) UI API Tracking APIs (REST, Python, Java, R)
Record and query experiments: code, configs, results, …etc Standard packaging format for reproducible ML runs Folder of code + data files with a “MLproject” description file MLflow Tracking Tracking Record and query experiments: code, configs, results, …etc import mlflow with mlflow.start_run(): mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # train model mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model)
MLflow backend stores Entity Store Artifact Repository FileStore (local filesystem) SQLStore (via SQLAlchemy) REST Store Artifact Repository S3 backed store Azure Blob storage Google Cloud storage DBFS artifact repo
Demo Goal: Classify hand-drawn digits Instrument Keras training code with MLflow tracking APIs Run training code as an MLflow Project Deploy an MLflow Model for real-time serving
MLflow Projects Motivation Diverse set of training tools Result: ML code is difficult to productionize. Diverse set of environments
MLflow Projects Project Spec Local Execution Remote Execution Code Config Remote Execution Dependencies Data Project: specify code & dependencies Soundtrack: Kubernetes, or EC2, or Databricks & logos
MLflow Projects Packaging format for reproducible ML runs Any code folder or GitHub repository Optional MLproject file with project configuration Defines dependencies for reproducibility Conda (+ R, Docker, …) dependencies can be specified in MLproject Reproducible in (almost) any environment Execution API for running projects CLI / Python / R / Java Supports local and remote execution
Example MLflow Project my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project>
Demo Goal: Classify hand-drawn digits Instrument Keras training code with MLflow tracking APIs Run training code as an MLflow Project Deploy an MLflow Model for real-time serving
MLflow Models Motivation ML Frameworks Inference Code Batch & Stream Scoring Serving Tools
MLflow Models ML Frameworks Inference Code Model Format Flavor 1 Flavor 2 Batch & Stream Scoring Standard for ML models ML Frameworks Serving Tools
MLflow Models Packaging format for ML Models Any directory with MLmodel file Defines dependencies for reproducibility Conda environment can be specified in MLmodel configuration Model creation utilities Save models from any framework in MLflow format Deployment APIs CLI / Python / R / Java
Example MLflow Model mlflow.tensorflow.log_model(...) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow my_model/ ├── MLmodel │ │ │ │ │ └ estimator/ ├─ saved_model.pb └─ variables/ ... Usable with Tensorflow tools / APIs Usable with any Python tool mlflow.tensorflow.log_model(...)
Model Flavors Example PyTorch Train a model mlflow.pytorch.log_model() predict = mlflow.pyfunc.load_pyfunc(…) predict(input_dataframe) Flavor 1: Pyfunc Model Format Flavor 2: PyTorch model = mlflow.pytorch.load_model(…) with torch.no_grad(): model(input_tensor)
Model Flavors Example predict = mlflow.pyfunc.load_pyfunc(…) predict(input_dataframe)
Demo Goal: Classify hand-drawn digits Instrument Keras training code with MLflow tracking APIs Run training code as an MLflow Project Deploy an MLflow Model for real-time serving
Get started with MLflow pip install mlflow to get started Find docs & examples at mlflow.org tinyurl.com/mlflow-slack
0.9.0 Release MLflow 0.9.0 was released this week! Major features: Tracking server supports SQL via SQLAlchemy Pluggable tracking server backends Docker environments for Projects Custom python models
Ongoing MLflow Roadmap UI scalability improvements (1.0) X-coordinate logging for metrics & batched logging (1.0) Fluent API for Java and Scala (1.0) Packaging projects with build steps (1.0+) Better environment isolation when loading models (1.0) Improved model schemas (1.0) ✔
Thank you!