Programming with data Lab 9 Tuesday, 11 December 2018 Stelios Sotiriadis Prof. Alessandro Provetti
What we will learn today? Version Control Use of BitBucket Transactional systems Database design activity
Version control
After PoP 1 First-hand experience with GiT and github Git is a protocol, with myriad implementations. github.com, bitbucket.com git is also the basis of contemporary software management: Gitlab, allura
Basic ideas Preparation of a long code (or document) invariably goes through several sessions. Changes can be incremental, i.e., more code/text or destructive. Sometimes we regret the changes and want to go back to the previous version(s) … and which changes do you regret, exactly?
Home made version control Some keep several copies of essentially the same file, called versions: report-1.doc, report-2.doc … Cooperation is done by explicit agreement and negotiated syncronization: report-AP.doc, report-AP2.doc vs. report-AP-2.doc? report-AP.doc, report-AP-PW.doc, report-AP-PW-AP2.doc this is the baseline
Motivations for version control in software coding is now a collective endeavour Coding is inherently error prone Local changes could have global effect For large organizations, authorship and certification could also be important.
Version control systems General protocols that can be implemented by several parties CVS, SVN, are centralized, server-based Git is [born] de-centralized and now de-facto standard. Centralized Git if offered, among others, by Github and Bitbucket. You may try, among others, the Bitbucket Git tutorial
Version control systems Git extends the File System with rollback and merge features. Delta compression keep versions as a description of the changes (+ or -) from the previous one Rollback from version 6 to v. 5: take v0 and apply 5 changes Central repository has an independent change history
Version control systems HOW TO WORK: PULL HOW TO WORK: COMMIT Medium frequency. Copies a repository to a local file system position self-contained folder: several copies can co-exist Ideally, at the start of a coding session Designed to selectively pull from local versions Commit: from version i to v. i+1. Commit should submit an atomic change to the code to v. control. Ideally, the change should have been tested before commit. In development, the more commits the finer the control of the code.
How to work: push Medium frequency Transfer your head version to the centralized repository Ideally, at the end of extensive testing. A standard session: pull, commit, committ…push
Using a private repository (BitBucket) To run this example you need an account in BitBucket
Example using BitBucket Clone a repository: git clone https://dkargatzis@bitbucket.org/dkargatzis/pwd.git User configuration (contributor email and name) git config user.email “you@example.com” git config user.name “Your name” Edit files or add a new one git add filename or git add . (all edit or new files)
Example Commit all changes from previous step git commit -m “A new file added” Push your changes to remote repository git push origin master Commit logs git log
Example Make a copy of your master branch and make safe edits on it git branch newbranch List of branches git branch Change to branch newbranch and start coding (edit or add files) Git checkout newbranch Show differences between two branches Git diff master newbranch
Example Add, commit and push newbranch [a-c] or merge this with the master branch, add, commit and push the new master version [d-h] git add . git commit -m “New features added in newbranch” git push origin newbranch git checkout master git merge -m “Bug solved - merge” newbranch git push origin master Before start coding pull all branch changes (pull changes from other users) git pull origin branchname
Transactional systems
Transactional systems DB design must be done with the view of a long-term activity, not in response to a specific/transient queries Tables shall represent either long- term entities or recorded interactions Foreign key columns will allow joins, i.e, navigation and selections among tables Practical tip: equip tables with two date/time columns: creation and last_updated SQL allows to express constraints on the values assigned to rows updates that would violate the integrity constraints are rejected SQL would rather generate errors than let you spoil the data a rollback mechanism brings the DB back to its previous, consistent state. often we need to package updates into atomic transactions
Atomic transactions Transaction makes this operations atomic: rental = 1000 begin transaction update accounts set bal = bal - rental where uid='u1024’ update accounts set bal = bal + rental where uid='u512’ commit Transaction makes this operations atomic: rollback will undo both updates.
ACID ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures.
Atomic transactions (ACID) A: Atomicity (all-or-nothing behaviour) Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely: if any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. C: Consistency (of the data) Consistency ensures that a transaction can only bring the database from one valid state to another, maintaining database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
Atomic transactions (ACID) I: Isolation (from concurrent transactions) Transactions are often executed concurrently (e.g., reading and writing to multiple tables at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs D: Durability (only transactions change data) Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash). This usually means that completed transactions (or their effects) are recorded in non-volatile memory.
Some solutions SQLite BerkelyDB No network operations Allows multiple readings (of the file), One writing transaction at a time: a queue system BerkelyDB Same interface as SQLite but concurrent writings Based on key/value pairs, not relations
SQL Exercise Work on exercise: Class9-database-exercice.pdf