Data science laboratory (DSLAB)

Slides:



Advertisements
Similar presentations
Introduction To GIT Rob Di Marco Philly Linux Users Group July 14, 2008.
Advertisements

Docker Martin Meyer Agenda What is Docker? –Docker vs. Virtual Machine –History, Status, Run Platforms Hello World Terminology: Image and.
Git for Version Control These slides are heavily based on slides created by Ruth Anderson for CSE 390a. Thanks, Ruth! images taken from
Windows Azure Conference 2014 Running Docker on Windows Azure.
Git – versioning and managing your software L. Grewe.
Version Control Systems academy.zariba.com 1. Lecture Content 1.What is Software Configuration Management? 2.Version Control Systems (VCS) 3.Basic Git.
Team 708 – Hardwired Fusion Created by Nam Tran 2014.
Version Control System Lisa Palathingal 03/04/2015.
Intro to Git presented by Brian K. Vagnini Hosted by.
Docker and Container Technology
Introduction to Git Yonglei Tao GVSU. Version Control Systems  Also known as Source Code Management systems  Increase your productivity by allowing.
Alfresco deployment with Docker Andrea Agili Software Engineer – Dr Wolf srl Tommaso Visconti DevOps – Dr Wolf srl.
Jun-Ru Chang Introduction GIT Jun-Ru Chang
GIT Version control. Version Control Sharing code via a centralized DB Also provides for Backtracking (going back to a previous version of code), Branching.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC
Version Control Systems
Basics of GIT for developers and system administrators
CS5220 Advanced Topics in Web Programming Version Control with Git
Kickstart drupal development
M.Sc. Juan Carlos Olivares Rojas
Cloud Computing for Science
Containers as a Service with Docker to Extend an Open Platform
Fundamentals Sunny Sharma Microsoft
Primož Gabrijelčič Git for Programmers Primož Gabrijelčič
Git and GitHub primer.
CReSIS Git Tutorial.
Docker and Azure Container Service
Version Control overview
Version control, using Git
In-Depth Introduction to Docker
CS5220 Advanced Topics in Web Programming Version Control with Git
Containers and Virtualisation
Development and Deployment
Andrew Pruski SQL Server & Containers
Virtualization overview
Version Control Systems
Storing, Sending, and Tracking Files Recitation 2
Containers in HPC By Raja.
Lab 1 introduction, debrief
An introduction to version control systems with Git
Version Control with Git accelerated tutorial for busy academics
SU Development Forum Introduction to Git - Save your projects!
Akshay Narayan git up to speed with RCS Akshay Narayan
Coding in the Cloud This slide deck includes recorded video demonstrations of content from the live presentation. Joon-Yee.
An introduction to version control systems with Git
SIG: Open Week 1: GitHub Tim Choh.
Introduction to Docker
Oracle DB and Docker Get Your Dockerized Oracle Sandbox Running in the Cloud or On- Premises Martin Knazovicky Dbvisit Software.
An introduction to version control systems with Git
Haiyan Meng and Douglas Thain
Intro about Contanier and Docker Technology
Docker, Drupal and Persistence
Version Control System - Git
Introduction to Git and GitHub
Docker Some slides from Martin Meyer Vagrant Box:
Version Control with Git and GitHub
Introduction to Docker
Version/revision control via git
Git Introduction.
Architecture Agnostic Docker Build Systems
Git GitHub.
Azure Container Service
Data science laboratory (DSLAB)
Data science laboratory (DSLAB)
Introduction to The Git Version Control System
Container technology, Microservices, and DevOps
Deploying machine learning models at scale
Introduction To GitHub
Introduction To GitHub
Advanced Git for Beginners
Presentation transcript:

Data science laboratory (DSLAB) Julien Eberle Swiss Data Science Center EPFL & ETH Zurich Data Science Lab – Spring 2018

Start your engines! Head over to the course webpage https://dslab2018.github.io/ Configure & Run Oracle VirtualBox Download, configure & start the DSLab VM Ubuntu linux Anaconda (Python 3), git, Docker, Spark / pySpark, pyCharm {username: student, password: student} Open a terminal Todays need: - git - docker

Solutions to Week #2 Any questions? https://github.com/dslab2018/dslab2018.github.io#week-3---07032018---module-1-- -python-for-data-scientists-34 Any questions?

Today’s lab week #3 Git Docker Kaggle Weeks 1 & 2 Week 4 Week 3

Tools for collaboration Share code Versioning: git (mercurial, svn,…) Service on the cloud: gitlab (github, bitbucket,…) Share data As input dataset: depends on the domain (https://www.nature.com/sdata/policies/repositories) As working dataset: git LFS (git –annex, shared drive, …) As output dataset: Zenodo, or other that gives you a DOI Share tasks, progress Data science as software development: Issue tracking, Milestones, Kanban board, Todo-lists Share results Make the report part of your code (or use Jupyter notebooks) Productize models, algorithms: micro-services with Docker, libraries (pipy)

Git Linus Torvald, 2005 Basic commands: clone, status, add, rm, commit, pull, push Useful commands: merge, stash, checkout, branch, diff, tag Advanced commands: cherry-pick, rebase, log, remote, init, revert, reset Commits represents snapshots of the state of the repository, expressed as a diff from the previous one, they form a tree (+ merges). Gitflow: good practice for branching .gitignore (https://www.gitignore.io/) Interactive tutorials for beginners : https://try.github.io Git aliases ??? Gitflow workflow: https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow Complete git book: https://git-scm.com/book/en/v2

Git Staging area Source: https://github.com/fer/dotfiles/wiki/git

Git LFS Extension for Large File Storage (alternative: git-annex ) Quick setup: install the package git lfs install git lfs track “*.zip” All files matching regexp in .gitattribute are replaced by pointer-files on remote Tracked files are then uploaded separately to an object store during push Pull retrieves the objects https://git-lfs.github.com/

Containerization Why: resource isolation lightweight portable extendable scalable enabling reproducibility (self contained)

Virtualization Layer abstracting the hardware resources Can sit on top of an OS (Virtualbox, VMWare, Qemu)... … or on a thin hypervisor layer (vSphere, Xen, Hyper-V) Virtual machines are running their own OS and kernel Source: https://jaxenter.com/containerization-vs-virtualization-docker-introduction-120562.html

Containerization Share host OS resources and kernel chroot, BSD Jails, LXC, Docker Isolates: processes network file-system memory/CPU availability Communication with the outside world: mounting volumes network services stdin/stdout

Docker Source: https://docs.docker.com/engine/docker-overview/#docker-architecture

Docker Source: https://docs.docker.com/engine/docker-overview/#docker-architecture

Union file system aufs, overlayfs, overlay2, btrfs layered filesystems that are merged some can be read-only (e.g. cd-rom)

Dockerfile FROM debian:stable USER root Base image, have a look at https://hub.docker.com/explore/ USER root Set the user for the following commands RUN apt-get update && apt-get install -y --force-yes apache2 Run arbitrary shell commands, each line add a overlay layer! EXPOSE 80 443 Defines the network ports that will be exposed COPY website.conf /etc/apache2/site-enabled/ Put a copy of a local file into the docker image, it also add a new layer. VOLUME ["/var/www", "/var/log/apache2"] Specifies mountpoint for external volumes ENTRYPOINT ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"] The command that is run when container starts https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

Docker image vs. Docker container a stack of read-only layers packaged and tagged (downloadable from a registry) docker pull / docker build docker images Docker container instance of a docker image with a writable layer (volume) has an id and a name docker run / docker create docker ps -a

Managing containers docker run / docker attach / docker exec -ti for attaching stdin/stdout to a new/runnning container docker start / docker stop for starting/stopping existing containers docker kill / docker rm for force-quit and cleanup the volume

Kaggle https://www.kaggle.com/

This week’s exercises Get the DSLab Week 3 exercise instructions Register on https://git-dslab.epfl.ch with your epfl e-mail