The Life of a MongoDB GitHub Commit

Slides:



Advertisements
Similar presentations
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Advertisements

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
03/09/2007CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.
INTRO TO MAKING A WEBSITE Mark Zhang.  HTML  CSS  Javascript  PHP  MySQL  …That’s a lot of stuff!
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Module 7: Fundamentals of Administering Windows Server 2008.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Operating System Concepts Chapter One: Introduction What is an operating system? Simple Batch Systems Multiprogramming Systems Time-Sharing Systems Personal-Computer.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
18 Copyright © Oracle Corporation, All rights reserved. Workshop.
Module 10 Administering and Configuring SharePoint Search.
How AWS Pricing Works Jinesh Varia Technology Evangelist.
1 MONGODB: CH ADMIN CSSE 533 Week 4, Spring, 2015.
A Simple Introduction to Git: a distributed version-control system CS 5010 Program Design Paradigms “Bootcamp” Lesson 0.5 © Mitchell Wand, This.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Engineering Projects In Community Service Matt Mooney Community Based Research University of Notre Dame.
Ch 26 & 27 Operating Systems.  Understand the purpose of an operating system  Be able to describe the tasks performed by an operating system.
Process Control Management Prepared by: Dhason Operating Systems.
© 2014 VMware Inc. All rights reserved. Cloud Archive for vCloud ® Air™ High-level Overview August, 2015 Date.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
Getting Started as an EdgeX Developer
SQL Database Management
Monitoring Windows Server 2012
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Chapter 1: Introduction
Current Generation Hypervisor Type 1 Type 2.
Lesson Objectives Aims You should be able to:
A Simple Introduction to Git: a distributed version-control system
Component 2 6G, H, I, J, K.
How to build consistent, scalable workspaces for data science teams
Large-scale file systems and Map-Reduce
Trends like agile development and continuous integration speak to the modern enterprise’s need to build software hyper-efficiently Jenkins:  a highly.
SOFTWARE and OPERATING SYSTEM.
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Getting Started as an EdgeX Developer
A Simple Introduction to Git: a distributed version-control system
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Introduction to Operating System (OS)
Monitoring HTCondor with Ganglia
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
Design and Implementation
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
100% Exam Passing Guarantee & Money Back Assurance
Software Architecture in Practice
Grid Means Business OGF-20, Manchester, May 2007
Examining a Windows NT Infrastructure (2)
Multimedia Authoring Tools
AWS Cloud Computing Masaki.
Software models - Software Architecture Design Patterns
EJ Grom, The Group, LLC Trang Le, EPA
CPU scheduling decisions may take place when a process:
An Overview of Virtual Machine Architectures
Workshop.
Planning and Scheduling in Manufacturing and Services
Ainsley Smith Tel: Ex
Why Background Processing?
Operating Systems p.describe the characteristics of knowledge-based systems; q.describe the purpose of operating systems; r.describe the characteristics.
Overview of Workflows: Why Use Them?
Lecture 3: Main Memory.
OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.
Risk Map Project By Qinghua Long Research Software Engineer
Associate Cloud Engineer Dumps PDF
This is a template for a presentation that you can use to introduce your team to Harvest. You can customize the content of the slides. You’ll want to pay.
Presentation transcript:

The Life of a MongoDB GitHub Commit Evergreen The Life of a MongoDB GitHub Commit This is a talk about how we do continuous integration at MongoDB. We’ll be a talking about Evergreen, our homegrown CI system, which we developed because of the unique testing needs that MongoDB has. Here’s some context: Describe MongoDB, - whether or not you have any view of the database will not affect your view of this talk. ] GENERAL PURPOSE OPEN SOURCE DISTRIBUTED DATABASE

This Talk Will Provide an overview of CI at MongoDB Journey through the Evergreen internals. Share some useful lessons we’ve learned along the way. Not a talk about our tests

Continuous Integration Travis Jenkins TeamCity Buildbot

Testing MongoDB is not a typical CI use case

MongoDB queries storage backups monitoring authentication aggregation Does a lot of things. queries storage backups monitoring authentication aggregation replication sharding MongoDB has a variety of features that interact in many ways.

20+ hours in serial MongoDB Has a lot of tests MongoDB has a variety of features that interact in many ways.

MongoDB Runs in a lot of places. Linux Windows Solaris OSX We support MongoDB on a many different platforms, from Linux to Windows, and architectures like x86 and PowerPC

Where can we run those tests?

We needed something with... Dynamic Host Allocation AND Static Hardware Multiplatform Support Powerful Navigation Open Source Licensing

Our in-house continuous integration system Evergreen Our in-house continuous integration system

High Level Elizabeth Programmer commits a change.

High Level

How do the pieces fit together?

Server Web User Interface Logs & Results Commits Tasks Agents Task Runner Agents Repo Tracker Scheduler Host Initializer

The life of a commit to the MongoDB codebase We need to see how we get from (commit) to (screenshot of a finished version)

A commit is born An engineer pushes a commit to mongodb/mongo master on GitHub. (TODO make this s

Picked up by Repotracker a312b7e d5a2826 Repotracker sits, ever diligent, checking for new commits every once in a while. When it finds a commit it picks it up.

Project Configuration tasks: -name: jsCore commands: - func: "do setup" - func: "run tests" vars: resmoke_args: --suites=core --storageEngine=mmapv1 run_multiple_jobs: true buildvariants: - name: linux-64 display_name: Linux run_on: - rhel55 - rhel55-test Repo Tracker d5a2826 evergreen.yml

Create tasks for the commit Repo Tracker a312b7e d5a2826 24ac5dd

Tasks Sharding Tests on Solaris Auth Tests on RHEL 6.2 Enterprise Compile on Amazon Linux!

Schedule the tasks Scheduler

Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux

Host Initializer

Agent Task Runner TODO: Add a slide on Go.

Run the Task Agent: Amazon Linux #1234 Task: sharding

name: sharding depends_on: name: compile commands: func: “do stuff” func: “run tests” name: replication func: “different” Sharding Sharding

Logs and Results Sent Back Notifications sharding_test1 results.json Aggregate Statistics

results.json sharding_test1

Server Web User Interface Logs & Results Commits Tasks Agents Task Runner Agents Repo Tracker Scheduler Host Initializer

Scheduler Transition to the scheduler internals. *Scheduler is responsible for creating the task queue and ensuring that there are enough hosts to run all the tasks in a reasonable amount of time.

Server Web User Interface Logs & Results Commits Tasks Agents Task Runner Agents Repo Tracker Scheduler Host Initializer

Schedule the tasks Scheduler After the repo tracker creates the tasks… Scheduler find all unscheduled tasks and adds them to priority queues that are platform specific.

Allocate New Hosts Host Initializer Scheduler After the task queues are created, the scheduler determines how many additional hosts need to be spun up based on how many hosts exist and how many tasks there are.

Scaling Load! Hosts when we need them; not when we don’t. Amazon EC2 Spending $ Graph of EC2 Usage - amazong Days in the Month

Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux Multiplexing all of the tasks that are being created by the repotracker, for different commits and maybe even for different projects and putting them into tasks queues that are solely based on the build variants.

Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux Prioities Dependenceis

Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux Now that we know the reasons for leveraging elastic computing to run our tests, lets take a look at how the scheudler figures out how many tests to run. I am going to talk about Amazon EC2 Linux

Task Queue 5 15 15 30 Amazon Linux Task Queue MENTION: HOW WE CALCULATE ESTIMATED TASK DURATION

EC2 Hosts 60 Amazon Linux EC2 Host MENTION: We are using the Amazon billing scheme

Goals Minimize Time in Task Queue (TQT) Minimize Idle Host Time 🤑 Go into depth of what Kyle said about why we created our own. We need to be conservative. We want to Save Developer Time - meaning we want tasks to be scheduled and run as fast as possible. We don’t want the task queue to be too long. We want to save money, meaning we want to spin up the minimum amount of hosts possible. Note that Developer time in itself varies in many ways These two things contradict each other.

Example 30 5 15 Amazon Linux Tasks We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts.

One Task Per Host 30 5 15 Amazon Linux EC2 Hosts (4) 🤑🤑🤑🤑 🙂🙂 45 30 60 15 5 0 TQT 0 TQT 0 TQT 0 TQT

With n tasks... Amazon Linux EC2 Hosts (n) 🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑 5 55 15 45 🙂 5 55 🙂 15 45 🙂 30 30 🙂 15 45 🙂 5 55 🙂 15 45 🙂 30 30 🙂 15 45 🙂 5 55 🙂 15 45 🙂 30 30 🙂 15 45 🙂 5 55 🙂 TODO: Make these proportional 15 45 🙂 30 30 15 45

Fit Tasks On As Few Hosts as Possible 30 5 15 Amazon Linux EC2 Hosts (1) 🤑🤑 45 30 5 15 🙂🙂 😕 ☹️ 0 TQT 30 TQT 45 TQT 😭 😭 60 TQT For example...

Job Shop Scheduling Minimize Makespan For example...

Makespan = Total Length Of Schedule For example...

One Task Per Host 30 5 15 Amazon Linux EC2 Hosts (4) 🤑🤑🤑🤑 🙂🙂 45 30 60 15 5 0 TQT 0 TQT 0 TQT 0 TQT

Fit Tasks On As Few Hosts as Possible 30 5 15 Amazon Linux EC2 Hosts (1) 🤑🤑 45 30 5 15 🙂🙂 😕 ☹️ 0 TQT 30 TQT 45 TQT 😭 😭 60 TQT For example...

Job Shop Scheduling Minimize Makespan NP HARD! Job Shop Scheduling Minimize Makespan NP hard - means that a problem can be reduced in polynomial-time into an NP problem. The Job Shop Scheduling is a known NP Hard problem. We use the Flexible Flow Job Shop problem, which is the same problem, except that there is a strict order of operations (depen

NP Hard = Approximations

Duration Based Estimation 30 5 15 Estimated Task Duration Maximum Time in Task Queue

Duration Based Estimation Amazon Linux Tasks 5 15 15 30 Amazon Linux EC2 Hosts (2) 60 60 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME) Total Task Time = 65 = 2 Hosts

Duration Based Estimation Amazon Linux Tasks 5 15 Amazon Linux EC2 Hosts (2) 30 30 15 45 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)

Duration Based Estimation Amazon Linux Tasks 5 15 Amazon Linux EC2 Hosts (2) 30 30 15 45 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)

Duration Based Estimation Amazon Linux Tasks 5 Amazon Linux EC2 Hosts (2) 30 30 15 15 30 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)

Duration Based Estimation Amazon Linux Tasks 5 Amazon Linux EC2 Hosts (2) 30 30 15 15 30 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)

Duration Based Estimation Amazon Linux Tasks Amazon Linux EC2 Hosts (2) 30 5 25 15 15 30 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)

Duration Based Estimation Amazon Linux Tasks 5 Amazon Linux EC2 Hosts (2) 30 5 5 20 15 15 10 20 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)

Duration Based Estimation Amazon Linux Tasks Amazon Linux EC2 Hosts (2) 30 5 5 5 15 15 15 10 20 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)

Success! With that we are able to create a good approximation for a very difficult problem.

Future Work Collecting more scheduler statistics to be able to optimize the scheduler It would be cool to maybe use stochastic optimizations to make the scheduler even better. As it is simple solution works well

Thank You! github.com/evergreen-ci/evergreen evergreen.mongodb.com erf@mongodb.com / @kyleerf shraya@mongodb.com / @shrayolacrayon