Let’s start with some questions: What is a repository? What is a commit?
To the terminal!
What is a repository? Repository . ├── .git │ ├── branches │ ├── COMMIT_EDITMSG │ ├── config │ ├── description │ ├── FETCH_HEAD │ ├── HEAD │ ├── hooks │ ├── index │ ├── info │ ├── logs │ ├── objects │ ├── ORIG_HEAD │ ├── packed-refs │ └── refs ├── some files ├── more files └── even more files Repository
What is a commit? $ git cat-file -p 2dfc9fe tree fcfde60a5d645769d536f9b7c0726560ed225a14 parent e4fdd0e4e49842d50f75e4d80f555fa783c0709f author Alan Du <alanhdu@gmail.com> 1496355755 +0100 committer Alan Du <alanhdu@gmail.com> 1496355755 +0100 Store fractional seconds as a u32 instead of a f64 Drops precision down to the nanoseconds
What are branches? Refs are “nicknames” for commits Branches are “auto-updating” refs (via HEAD)
What does this mean practically? git checkout COMMIT git checkout BRANCH git checkout REF
What does this mean practically? Is it: git checkout origin/master git pull origin/master Or is it: git checkout origin master git pull origin master
What does this mean practically? Is it: git checkout origin/master git pull origin/master Or is it: git checkout origin master git pull origin master
What does this mean practically? git bisect start git bisect bad COMMIT git bisect good COMMIT Binary search to find which commit introduced the bug
What does this mean practically? Merge, baby, merge? (git merge) Or rebase all the things? (git rebase)
So clearly we’re done right?
Git is a “snapshot”-based system Commits store the current tree There’s no first-class notion of diff or patch!!! Git’s ability to deal with diffs is ad-hoc and fundamentally flawed
What’s wrong with snapshots? git cherry-pick Let’s say you’re the cpython project and have: master 3.6 3.5 3.4 2.7
What’s wrong with snapshots? git cherry-pick Cherry-picking changes the identity of the commit! So if you cherry-pick two related commits in the wrong order..
It gets worse A / \ B B New feature!
It gets worse A / \ B B New feature! | | A A Revert the change!
It gets worse A / \ B B New feature! |\ /| A X A Revert the change! |/ \| B B Merge each other’s work
It gets worse A / \ B B New feature! |\ /| A X A Revert the change! |/ \| B B Merge each other’s work \ / B WTF?
≠ It gets worse Git’s merge algorithm is fundamentally flawed In general: B1 – B2 / \ \ A \ \ \ \ \ A1 – A2 – A3 B1 – B2 / \ A \ \ \ A1 ––––– A3 ≠
The Fundamental Problem is that diffs in git are second-class citizens So… why don’t we just make diffs first-class citizens? Can only add or delete lines (no editing!) Each patch records the “ancestry” of each line (or deletion)
Patches Compose
Commutative Diagram
The Merge Commutative Diagram
Not a merge
Not a merge
But that isn’t unique! We want the “smallest” merge (In the language of category theory, we want the pushout).
One Technical Detail The pushout isn’t always a file!
One Technical Detail The pushout isn’t always a file! In category theory terms, need the free co-completion
How does this help us? How to revert shoes?
Quick Summary What are: Git repositories? Git commits? Git branches? What’s wrong with snapshot-based systems? Baby patch theory (inspired by Darcs and Pijul)
References https://git-scm.com/book/en/v2 https://jneem.github.io/merging/ https://tahoe-lafs.org/~zooko/badmerge/concrete- good-semantics.html http://r6.ca/blog/20110416T204742Z.html https://codewords.recurse.com/issues/two/git-from- the-inside-out
Questions?