The Life of a MongoDB GitHub Commit Evergreen The Life of a MongoDB GitHub Commit This is a talk about how we do continuous integration at MongoDB. We’ll be a talking about Evergreen, our homegrown CI system, which we developed because of the unique testing needs that MongoDB has. Here’s some context: Describe MongoDB, - whether or not you have any view of the database will not affect your view of this talk. ] GENERAL PURPOSE OPEN SOURCE DISTRIBUTED DATABASE
This Talk Will Provide an overview of CI at MongoDB Journey through the Evergreen internals. Share some useful lessons we’ve learned along the way. Not a talk about our tests
Continuous Integration Travis Jenkins TeamCity Buildbot
Testing MongoDB is not a typical CI use case
MongoDB queries storage backups monitoring authentication aggregation Does a lot of things. queries storage backups monitoring authentication aggregation replication sharding MongoDB has a variety of features that interact in many ways.
20+ hours in serial MongoDB Has a lot of tests MongoDB has a variety of features that interact in many ways.
MongoDB Runs in a lot of places. Linux Windows Solaris OSX We support MongoDB on a many different platforms, from Linux to Windows, and architectures like x86 and PowerPC
Where can we run those tests?
We needed something with... Dynamic Host Allocation AND Static Hardware Multiplatform Support Powerful Navigation Open Source Licensing
Our in-house continuous integration system Evergreen Our in-house continuous integration system
High Level Elizabeth Programmer commits a change.
High Level
How do the pieces fit together?
Server Web User Interface Logs & Results Commits Tasks Agents Task Runner Agents Repo Tracker Scheduler Host Initializer
The life of a commit to the MongoDB codebase We need to see how we get from (commit) to (screenshot of a finished version)
A commit is born An engineer pushes a commit to mongodb/mongo master on GitHub. (TODO make this s
Picked up by Repotracker a312b7e d5a2826 Repotracker sits, ever diligent, checking for new commits every once in a while. When it finds a commit it picks it up.
Project Configuration tasks: -name: jsCore commands: - func: "do setup" - func: "run tests" vars: resmoke_args: --suites=core --storageEngine=mmapv1 run_multiple_jobs: true buildvariants: - name: linux-64 display_name: Linux run_on: - rhel55 - rhel55-test Repo Tracker d5a2826 evergreen.yml
Create tasks for the commit Repo Tracker a312b7e d5a2826 24ac5dd
Tasks Sharding Tests on Solaris Auth Tests on RHEL 6.2 Enterprise Compile on Amazon Linux!
Schedule the tasks Scheduler
Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux
Host Initializer
Agent Task Runner TODO: Add a slide on Go.
Run the Task Agent: Amazon Linux #1234 Task: sharding
name: sharding depends_on: name: compile commands: func: “do stuff” func: “run tests” name: replication func: “different” Sharding Sharding
Logs and Results Sent Back Notifications sharding_test1 results.json Aggregate Statistics
results.json sharding_test1
Server Web User Interface Logs & Results Commits Tasks Agents Task Runner Agents Repo Tracker Scheduler Host Initializer
Scheduler Transition to the scheduler internals. *Scheduler is responsible for creating the task queue and ensuring that there are enough hosts to run all the tasks in a reasonable amount of time.
Server Web User Interface Logs & Results Commits Tasks Agents Task Runner Agents Repo Tracker Scheduler Host Initializer
Schedule the tasks Scheduler After the repo tracker creates the tasks… Scheduler find all unscheduled tasks and adds them to priority queues that are platform specific.
Allocate New Hosts Host Initializer Scheduler After the task queues are created, the scheduler determines how many additional hosts need to be spun up based on how many hosts exist and how many tasks there are.
Scaling Load! Hosts when we need them; not when we don’t. Amazon EC2 Spending $ Graph of EC2 Usage - amazong Days in the Month
Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux Multiplexing all of the tasks that are being created by the repotracker, for different commits and maybe even for different projects and putting them into tasks queues that are solely based on the build variants.
Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux Prioities Dependenceis
Windows 10 Solaris RHEL 6.2 OSX 10.10 Amazon Linux Now that we know the reasons for leveraging elastic computing to run our tests, lets take a look at how the scheudler figures out how many tests to run. I am going to talk about Amazon EC2 Linux
Task Queue 5 15 15 30 Amazon Linux Task Queue MENTION: HOW WE CALCULATE ESTIMATED TASK DURATION
EC2 Hosts 60 Amazon Linux EC2 Host MENTION: We are using the Amazon billing scheme
Goals Minimize Time in Task Queue (TQT) Minimize Idle Host Time 🤑 Go into depth of what Kyle said about why we created our own. We need to be conservative. We want to Save Developer Time - meaning we want tasks to be scheduled and run as fast as possible. We don’t want the task queue to be too long. We want to save money, meaning we want to spin up the minimum amount of hosts possible. Note that Developer time in itself varies in many ways These two things contradict each other.
Example 30 5 15 Amazon Linux Tasks We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts.
One Task Per Host 30 5 15 Amazon Linux EC2 Hosts (4) 🤑🤑🤑🤑 🙂🙂 45 30 60 15 5 0 TQT 0 TQT 0 TQT 0 TQT
With n tasks... Amazon Linux EC2 Hosts (n) 🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑🤑 5 55 15 45 🙂 5 55 🙂 15 45 🙂 30 30 🙂 15 45 🙂 5 55 🙂 15 45 🙂 30 30 🙂 15 45 🙂 5 55 🙂 15 45 🙂 30 30 🙂 15 45 🙂 5 55 🙂 TODO: Make these proportional 15 45 🙂 30 30 15 45
Fit Tasks On As Few Hosts as Possible 30 5 15 Amazon Linux EC2 Hosts (1) 🤑🤑 45 30 5 15 🙂🙂 😕 ☹️ 0 TQT 30 TQT 45 TQT 😭 😭 60 TQT For example...
Job Shop Scheduling Minimize Makespan For example...
Makespan = Total Length Of Schedule For example...
One Task Per Host 30 5 15 Amazon Linux EC2 Hosts (4) 🤑🤑🤑🤑 🙂🙂 45 30 60 15 5 0 TQT 0 TQT 0 TQT 0 TQT
Fit Tasks On As Few Hosts as Possible 30 5 15 Amazon Linux EC2 Hosts (1) 🤑🤑 45 30 5 15 🙂🙂 😕 ☹️ 0 TQT 30 TQT 45 TQT 😭 😭 60 TQT For example...
Job Shop Scheduling Minimize Makespan NP HARD! Job Shop Scheduling Minimize Makespan NP hard - means that a problem can be reduced in polynomial-time into an NP problem. The Job Shop Scheduling is a known NP Hard problem. We use the Flexible Flow Job Shop problem, which is the same problem, except that there is a strict order of operations (depen
NP Hard = Approximations
Duration Based Estimation 30 5 15 Estimated Task Duration Maximum Time in Task Queue
Duration Based Estimation Amazon Linux Tasks 5 15 15 30 Amazon Linux EC2 Hosts (2) 60 60 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME) Total Task Time = 65 = 2 Hosts
Duration Based Estimation Amazon Linux Tasks 5 15 Amazon Linux EC2 Hosts (2) 30 30 15 45 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)
Duration Based Estimation Amazon Linux Tasks 5 15 Amazon Linux EC2 Hosts (2) 30 30 15 45 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)
Duration Based Estimation Amazon Linux Tasks 5 Amazon Linux EC2 Hosts (2) 30 30 15 15 30 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)
Duration Based Estimation Amazon Linux Tasks 5 Amazon Linux EC2 Hosts (2) 30 30 15 15 30 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)
Duration Based Estimation Amazon Linux Tasks Amazon Linux EC2 Hosts (2) 30 5 25 15 15 30 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)
Duration Based Estimation Amazon Linux Tasks 5 Amazon Linux EC2 Hosts (2) 30 5 5 20 15 15 10 20 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)
Duration Based Estimation Amazon Linux Tasks Amazon Linux EC2 Hosts (2) 30 5 5 5 15 15 15 10 20 We have a set of tasks and a set of hosts and we need to figure out the optimal ordering of these tasks on hosts. Bring back the point from Kyle’s talk about how we want to not be waiting a week and not have control. Developer Wait time (TODO CREATE A NAME)
Success! With that we are able to create a good approximation for a very difficult problem.
Future Work Collecting more scheduler statistics to be able to optimize the scheduler It would be cool to maybe use stochastic optimizations to make the scheduler even better. As it is simple solution works well
Thank You! github.com/evergreen-ci/evergreen evergreen.mongodb.com erf@mongodb.com / @kyleerf shraya@mongodb.com / @shrayolacrayon