Version Control at Google A Case Study This is from: Why Google Stores Billions of Lines of Code in a Single Repository by Rachel Potvin and Josh Levenberg CACM Vol 59, no 7, July 2016. Copyright © 2017 Curt Hill
Introduction Google is a major software developer Their software is a critical asset to their growth and profitability Protecting this code is a priority They have developed their own version control system that is used by the majority (95%) of their developers It is a distributed version system It is massive Copyright © 2017 Curt Hill
History For the first 10 years of so their source repository existed on a single machine They have examined many versioning systems but found all of them unsuitable After the single machine approach was outgrown they switched to a system developed internally called Piper Copyright © 2017 Curt Hill
Piper A distributed and redundant version control system It stores data at 10 Google data centers world wide Google’s developers are able to see essentially the entire code base Regardless of where they work Copyright © 2017 Curt Hill
Statistics Taken from January 2015 Number of files: 1 billion Source files: 9 million Lines of code: 2 billion Number of commits: 36 million Size on disk: 86 TB Commits per workday: 40,000 Does this constitute a serious repository? Copyright © 2017 Curt Hill
Workflow Like most versioning systems the developer creates a local copy of the needed files The developer writes/modifies/tests as is usual The commit process is somewhat more extensive Copyright © 2017 Curt Hill
Commit Code that is ready for commit must first go through a code review Other developers will inspect and evaluate A variety of tools can be used to verify the quality of the commit prior to its actual addition to the code base Copyright © 2017 Curt Hill
Access Piper may also be accessed by Client in the Cloud CitC This may be used by any developer that has access to the Google cloud Now the local workspace is in the cloud instead of the local machine A developer may work on any machine with cloud access as if at their personal workstation Files are now easy to share as well Copyright © 2017 Curt Hill
CitC The CitC workspace now looks like a piece of the entire Piper codebase Any file in Piper may be browsed Only files that are modified end up with local entries in the cloud workspace Thus a build/make would use some files directly from the repository with others that have been modified in a transparent fashion Copyright © 2017 Curt Hill
Trunks Like other systems Piper has trunk-based versions Most developers are working on the head version of a file This is the most recent commit In general, separate branches are short-lived This avoids the problems of merging the results together Copyright © 2017 Curt Hill
Owners and Reviewers The entire code base is visible to every developer The exceptions are a few pieces of confidential and secret code The code base is organized in tree shape directories Each directory has a set of owners These are the ones primarily responsible for the product of the directory Anyone may do a code review, but an owner must approve Copyright © 2017 Curt Hill
Code Reviewers A reviewer may comment on many aspects of the code There are language specific guides They will use a code review tool named Critique This receives the comments of the reviewer and attaches them to the file in question This may attach to specific line numbers When the code is approved this is available to the owner Copyright © 2017 Curt Hill
The Commit There are a number of custom tools that help to ensure that a commit is of high quality These follow the code review process There is an automatic testing structure Each commit forces a series of tests on code that depends on the committed file Breakage of these dependencies forces the rescinding of the commit A pre-submit also triggers this testing Copyright © 2017 Curt Hill
Breakage Example Suppose module M is used in programs A, B and C This is good code reuse When an update to M is committed, then A, B and C are all rebuilt If the builds do not break (no compile issues) we may accept the commit Another form of breakage is when the automated unit tests for the modules of A, B and C fail Copyright © 2017 Curt Hill
Automated Analysis The Tricorder system is part of the pre-submit and commit facility It does a static analysis on the code in question It may suggest one line fixe Other tools do profiling Test data code coverage Copyright © 2017 Curt Hill
Pros One repository Simplified dependency information Atomic changes No question where the most recent version is Simplified dependency information Atomic changes Easy to collaborate with different teams All the code is visible to every developer Copyright © 2017 Curt Hill
Cons There is no Commercial Off The Shelf software It is all Google developed and maintained The massive size of the base makes for problems in discovering existing code Startup costs for new developers who are used to other versioning systems of less complexity Copyright © 2017 Curt Hill
GIT The GIT repository has been considered and used The GIT repository is used for Android and Chrome Not in Piper These are open source They need to be usable by developers outside of the employ of Google Converting code base to GIT would require thousands of repositories Massive shock to their culture Copyright © 2017 Curt Hill
Conclusion This approach would not work for everyone The collaborative culture of Google makes this a good choice It has provided several serious technical obstacles to implementation Who better than Google to solve? Copyright © 2017 Curt Hill