Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database & Information Systems Group University of Basel DBIS Group Talk Michael Springmann 2009-02-11 Distributed Source Code Management.

Similar presentations


Presentation on theme: "Database & Information Systems Group University of Basel DBIS Group Talk Michael Springmann 2009-02-11 Distributed Source Code Management."— Presentation transcript:

1 Database & Information Systems Group University of Basel DBIS Group Talk Michael Springmann 2009-02-11 Distributed Source Code Management

2 2 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Outline I.Terminology II.A little theory of Merging III.Concepts of Distributed Source Code Management IV.Selected Implementations and Details V.Conclusion & Outlook 2

3 3 2009-02-11 Distributed Source Code Management - Database & Information Systems Group What is VCS / SCM? I.VCS = Version Control System  RCS = Revision Control System is (also) particular implementation term avoided to reduce confusion II.SCM = Source Code Management III.SCM = Software Configuration Management?  we don’t care as long as we know what we want 3

4 4 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Basic Separation in SCM I.Working Directory  every developer has his own working directory  contains not only the essential source code and build scripts but commonly also: generated source code generated documentation compiled results, linked libraries / executables, generated packages log files of test runs (temporary) files generated by the used IDE II.Repository  provides revision control  provides possibility to exchange code with other developers 4

5 5 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Centralized SCM I.One central repository  Each developer checks out code from this repository  Each developer commits changes to this repository  SCM / Repository supports concurrent work by: Preventing conflicts –Locking of items (pessimistic approach) »Lock, Modify, Write –No (other) conflict resolution required Conflicts detection –Identifying conflicting check-ins (optimistic approach) »Copy, Modify, Merge –Conflict resolution is required »Automated vs. Manual »Single line of development vs. Branch/Fork 5

6 6 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Problems with Conflict Handling in Centralized SCM I.Lock-based  locks may not be released because forgotten because needed, but deadlock (refactoring!) client failure  low concurrent code modifications require much more strict organizational rules to work at all  Cannot prevent incompatibility between several files II.Conflict detection  Every local change is a “branch” until successful commit local changes are not version controllable until –successfully merged with most recent version on repository –officially branched, thus delaying problem of integration 6

7 7 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 7

8 8 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 8

9 9 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 9

10 10 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 10

11 11 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 11

12 12 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 12

13 13 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 13

14 14 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Copy-modify-merge solution 14

15 15 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Merge solution? 15 I.The SCM cannot know about semantics of file contents  Harry has to do the merging based on his knowledge of A’ A’’  That’s a traditional 2-Way Merge II.Merges are considered very expensive

16 16 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 16

17 17 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Can we do better? I.Problem in 2-Way Merge: Any of the differences could be the one we want to preserve.  If we know diff A A’ and diff A A’’, we can apply both somehow automatically to A to generate A + which preserves both changes will be identical to A * iff –file format correctly captured by diff (e.g. line breaks) –no conflicts exist (e.g. lines modified in both files) II.Solution using A, A’, and A’’ is called a 3-Way Merge  Requires knowledge of common ancestor A  May introduce errors and bugs  To be performed before any uncommited change can be commited 17

18 18 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 18

19 19 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Subversion: File Structure I.Local .svn in every directory not friendly for tools like find / grep  same size or bigger than current working directory.svn/text-base/.svn-base is the base revision of file used for reverting and diff for delta compression in transfer II.Repository  Structure commonly /trunk /tags /branches  If missed in the beginning, branching becomes pain  Tags are supposed to be immutable, but actually branches 19

20 20 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Distributed SCM I.Not a single repository, but each local clone is a branch II.No central revision counter  revisions are identified by (secure) hashes on changesets can also detect data corruption  explicit ancestry stored based on these identifiers can also detect malicious modifications III.See branching as the most natural operation to perform  Therefore merging is the important operation to support  Because ancestry is preserved, use sophisticated 3-Way Merge Allow filetype (or programming language) based merge settings Allow configuring external tools for conflicts  Because each local clone is a branch, you can always commit before merge and switch back and forth 20

21 21 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Distributed Repositories 21 Repo 1Repo 2 propagate changes between repos synchronizes local working copy with local repository I.Two basic kinds of communication

22 22 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Some basic, common commands I.Local Repository  init  log  verify  heads  tip  tag  tags  branches 22 other Repo  clone  pull  push  serve  incoming  outgoing  export  import  bundle Working Dir  status  add  remove  commit  update  revert  branch  merge Shortcuts for frequent tasks like fetch pullmerge if needed commit if automatic merge successful, otherwise update for manual intervention

23 23 2009-02-11 Distributed Source Code Management - Database & Information Systems Group The forking non-problem I.Every single change is potentially a fork point  extends the idea of open source removes social distinction that centralized tools impose: between insiders (people with commit access) and outsiders (people without) selection, which version to use / trust is not based on who has the repository, but by which repository the user pulls from make it easier to reconcile after a social fork since this is no technical issue, just a social 23

24 24 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Collaboration Models (1) I.DSCM allow normally a variety of way to interact  the users choose which one fits best their needs II.One centralized repository can be emulated  each commit follows push  organizational rules may require tests by the developer between commit and push  central repository may queue changesets wait for automated build and runtime tests to finish code review by quality control group III.informal anarchy  every participant of a “sprint” serves repository  through in-person communication establish pull strategy 24

25 25 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Collaboration Models (2) 25 IV.Branches  either explicit in single repository  or implicit as clones V.Feature Branches VI.Pull-only versus shared-push collaboration  Linux kernel development is pull-only You have to convince kernel subsystem maintainers to pull your code Linus will only pull from people he trusts

26 26 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Sample Scenarios I.Course Homework Assignments  Each Course, Assignment, Exam may be a Clone of lehre.sty  Fetch from lehre.sty will merge in style changes  Exchange of changesets do not require “public” repository E-mail bundles is sufficient –Which is what we do now in many cases, losing version control II.Firewall issues at conferences, during travel  commits are always possible  E-mail / USB stick for exchange with others III.Backup  Every clone is a complete backup of the repository 26

27 27 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Some important Distributed SCM I.Code Co-op (1997, Commercial) II.BitKeeper (~1998, Commercial) III.GNU arch (C, 2001, GPL) IV.darcs (Haskell, 2002, GPL) V.Monotone (C++ / SQLite, 2003, GPL) VI.SVK (Perl on top of SVN, 2003, GPL) VII.+current main 3 in more detail (git, Mercurial, Bazaar) 27

28 28 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Commonalities in git, Mercurial, Bazaar I.Distributed SCM based on Merge as Concurrency model II.GPL’ed running on Linux/Unix, Windows, Mac OS  appeared in March/April 2005  frequently used now in Open Source projects III.Technology  fully functional local clones, atomic commits, merge tracking  support at least: local files, http(s), ssh, E-mail patches & bundles  no per file commits, no partial checkout goes against philosophy, but may get supported some time IV.Integration  Migration supported, in particular from Subversion  Bug/Issue Tracker integration, E-mail notifications  IDE Plugins available, Maven SCM Implementation exists 28

29 29 2009-02-11 Distributed Source Code Management - Database & Information Systems Group git I.C / Bourne Shell / Perl, GPL 2/3 II.Used by:  Linux Kernel Dev, git, Perl, Ruby on Rails, Android, WINE, Fedora, X.org, VLC,... III.Strongly influenced by Linus Torvalds and other Linux kernel developers as they needed a replacement for BitKeeper IV.Fast, but complex  about 170 individual commands  makes heavy use of underlying filesystem less portable to Windows  may require repacks of repository to shrink size  comes with wrappers to appear as CVS and SVN server 29

30 30 2009-02-11 Distributed Source Code Management - Database & Information Systems Group git - In our Group? I.Java support  JGit = Java implementation of git  EGit = Eclipse plig-in using JGit  Both are in alpha state II.Windows GUI support  no good native Win GUI outside IDE III.... but if you have 1h time and want to see what Linus Torvalds’ thinks of Subversion, have a look at: http://youtube.com/watch?v=4XpnKHJAok8http://youtube.com/watch?v=4XpnKHJAok8 30

31 31 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 31

32 32 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 32

33 33 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 33

34 34 2009-02-11 Distributed Source Code Management - Database & Information Systems Group git Model Simplicity 34

35 35 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Compare to: DILIGENT Storage Model 35 Lower Layer: Storage Layer  Info Object has ID Properties Raw Content Relationships

36 36 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Compare to: DILIGENT Storage Model 36 I.Higher Layer: Content Layer  For Digital Libraries, two entities Document Collection Lower Layer: Storage Layer  Info Object has ID Properties Raw Content Relationships

37 37 2009-02-11 Distributed Source Code Management - Database & Information Systems Group git Model Simplicity vs. Storage Model Flexibility 37 Info Object Properties Document Raw ContentRelationships Collection new type in content layer new type in content layer (if not one backend per repo) Relationships Properties Relationships Properties

38 38 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Mercurial (Hg) I.Python / very few C, GPL 2  Abbreviated relating to chemical symbol Hg for Mercury II.Used by:  OpenJDK, Netbeans, OpenSolaris, Mozilla, Xen, GNU Octave... III.Not as fast as git for many tasks  but still considered significantly faster than Subversion  rather intuitive command set for former SVN users  user friendly IDs any unique portion of a 40 hex SHA1 can be used “short form” uses only 12 digits.  portable 38

39 39 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Mercurial - In our Group? I.Java support  Sun is using it for OpenJDK - you may guess :-)  Netbeans: native support  MercurialEclipse stable and beta branches II.Windows GUI support  TortoiseHg 39

40 40 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 40

41 41 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 41

42 42 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 42

43 43 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 43

44 44 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Revision history, branching, merging 44

45 45 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Storage Efficiency by Hashes 45 I.Not only changesets are being hashed, but also files  Identical files do not consume additional space  Name, location, properties etc. stored separately in manifest

46 46 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Relatinship between Working Directory and Repository I.For large files or lots of history  filelog stored in separate data (“.d” suffix)  and index (“.i” suffix) file II.For small files without much history  just one (the “.i” suffix) file 46

47 47 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Fast Retrieval I.Deltas + Snapshots (like P- and I-frames in video compression) 47

48 48 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Safe operation I.Mercurial only appends data to the end of revlog file  in atomic operations  exception: rollback deletes last (and only one last) transaction  readers never block writers, writers never block readers when reading starts, dirstate is examined for latest commited transaction only one “concurrent” writer per repository II.Each revlog entry has cryptographic hash  more than check against corruption; used as the identifiers for revisions correct hashes enforced for pulls from another repository  periodic snapshots more robust against partial data corruption on hardware failure, revlog might get partially reconstructed (as opposed to delta-only compression) 48

49 49 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Bazaar (Bzr) I.Python / very few C, GPL 2  Canonical: Bazaar is Version Control for Human Beings. II.Used by:  some Ubuntu, Launchpad, Upstart, MySQL, APT,... III.Considered to be the slowest of the three, but  uses simple revision numbers  able to track empty directories  accompanied by Patch Queue Manager (PQM)  portable  (s)ftp is sufficient to host repository 49

50 50 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Bazaar (Bzr) I.Java Support  BzrEclipse based on MercurialEclipse II.Windows GUI support  TortoiseBzr 50

51 51 2009-02-11 Distributed Source Code Management - Database & Information Systems Group 51

52 52 2009-02-11 Distributed Source Code Management - Database & Information Systems Group Conclusion & Outlook I.DSCM offers very nice features over centralized SCM  without adding too much complexity  is mature enough for use  tooling may vary Mercurial seems to be the best choice Is installed on dbisweb feature complete replacement to svndbis –even with LDAP integration for accounts –Mercurial hands-on in 14 days 52


Download ppt "Database & Information Systems Group University of Basel DBIS Group Talk Michael Springmann 2009-02-11 Distributed Source Code Management."

Similar presentations


Ads by Google