Software Engineering for Data Scientists

Slides:



Advertisements
Similar presentations
Fundamentals of Python: From First Programs Through Data Structures Chapter 2 Software Development, Data Types, and Expressions.
Advertisements

Copyright © 2014 Dr. James D. Palmer; This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
[Title of meeting] [Name of sponsor] [Date] For guidance on working with PowerPoint and reformatting slides, click on Help, then Microsoft PowerPoint Help,
Documentation Generators: Internals of Doxygen John Tully.
Documentation 1 Comprehending the present – Investing in the future.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering 3 October 2007.
CIS101 Introduction to Computing Week 11. Agenda Your questions Copy and Paste Assignment Practice Test JavaScript: Functions and Selection Lesson 06,
Chapter 3: Introduction to C Programming Language C development environment A simple program example Characters and tokens Structure of a C program –comment.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Epydoc API Documentation Extraction in Python Edward Loper.
1 Computing Software. Programming Style Programs that are not documented internally, while they may do what is requested, can be difficult to understand.
Fundamentals of Python: First Programs
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
Input, Output, and Processing
1 Documenting with Javadoc. 2 Motivation  Why document programs? To make it easy to understand, e.g., for reuse and maintenance  What to document? Interface:
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Oct 15, 2007Sprenkle - CS1111 Objectives Creating your own functions.
SE: CHAPTER 7 Writing The Program
Variables and ConstantstMyn1 Variables and Constants PHP stands for: ”PHP: Hypertext Preprocessor”, and it is a server-side programming language. Special.
Functions, Procedures, and Abstraction Dr. José M. Reyes Álamo.
Hey, Ferb, I know what we’re gonna do today! Aims: Use formatted printing. Use the “while” loop. Understand functions. Objectives: All: Understand and.
T Iteration Demo Group name [PP|I1|I2] Iteration
Python Mini-Course University of Oklahoma Department of Psychology Day 2 – Lesson 5 Function Interfaces 4/18/09 Python Mini-Course: Day 2 - Lesson 5 1.
The Software Development Process
1 A simple C++ program // ======================================================= // File:helloworld.cpp // Author:Vana Doufexi // Date:1/4/2006 // Description:Displays.
CSD 340 (Blum)1 Starting JavaScript Homage to the Homage to the Square.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python Karsten Hokamp, PhD Genetics TCD, 03/11/2015.
8 January 2016Birkbeck College, U. London1 Introduction to Programming Lecturer: Steve Maybank Department of Computer Science and Information Systems
T Iteration Demo Group name [PP|I1|I2] Iteration
12. MODULES Rocky K. C. Chang November 6, 2015 (Based on from Charles Dierbach. Introduction to Computer Science Using Python and William F. Punch and.
Functions in C++ Top Down Design with Functions. Top-down Design Big picture first broken down into smaller pieces.
Python Documentation Fran Fitzpatrick. Overview  Comments  Documentation Strings  Pydoc  Comments  Documentation Strings  Pydoc.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
Zope-cookbook.org Tarek Ziadé, Nuxeo
Module 1 Introduction to JavaScript
Fundamentals of Programming I Overview of Programming
Development Environment
Topics Designing a Program Input, Processing, and Output
CS 330 Class 7 Comments on Exam Programming plan for today:
Software Engineering for Data Scientists
ANNOUNCEMENT The missed lecture will be made up this Monday evening in the Tech PC classroom (MG51). A tentative time interval is 6:30-8:00. The exact.
Introduction to Python
CS005 Introduction to Programming
Software Engineering for Data Scientists
Topic: Functions – Part 2
Variables, Expressions, and IO
Statement atoms The 'atomic' components of a statement are: delimiters (indents, semicolons, etc.); keywords (built into the language); identifiers (names.
NA4 Test Team Status Test meeting, 07/09/04
Functions.
Completing the tasks for A452 with….
© 2016 Pearson Education, Inc.,Hoboken, NJ. All rights reserved.
Fill the screen challenge!
Guide To UNIX Using Linux Third Edition
Functions, Procedures, and Abstraction
Winter 2018 CISC101 11/27/2018 CISC101 Reminders
T. Jumana Abu Shmais – AOU - Riyadh
Helping a friend out Guidelines for better software
Program Documentation
Agile testing for web API with Postman
Rocky K. C. Chang 15 November 2018 (Based on Dierbach)
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
Topics Designing a Program Input, Processing, and Output
CMSC 345 Programming.
Topics Designing a Program Input, Processing, and Output
Creating Maintainable code
CSCE-221 C++ Coding Standard/Guidelines
Functions, Procedures, and Abstraction
Creating readable code
Presentation transcript:

Software Engineering for Data Scientists Knowledge and solutions for a changing world Advancing data-intensive discovery in all fields Be boundless Software Engineering for Data Scientists UW DIRECT () https://uwdirect.github.io David A. C. Beck (dacb) Chemical Engineering & eScience Institute

Agenda Documentation Communication around code Project stuff Standups Technology reviews Code review Project stuff

PEP8

PEP8 Consistency

Documentation

Documentation Two types Code comments README.md Code readers Users What the code is doing and why E.g. Users How to use your code Code comments README.md

Documentation .md .md files are Markdown Markdown is a lightweight text formatting language for producing mildly styled text Ubiquitous (github.io, README.md, etc.) E.g. Google markdown editor browser http://dillinger.io

Documentation What kind of stuff going in a repositories README.md? https://github.com/kallisons/NOAH_LSM_Mussel_v2.0

Documentation Comments Shell script # Python

Documentation Good comments Make the comments easy to read Write the comments in English Discuss the function parameters and results

Documentation Good comments Don’t comment bad code, rewrite it! Then comment it

Documentation Good comments Some languages have special function headers

Documentation Good comments Some languages have special function headers This example is fantastic! It describes Calling synopsis (example usage) The input parameters The output variables Aimed at coders and users

Documentation Good comments Some languages have special function headers These comments should also describe side effects Any global variables that might be altered Plots that are generated Output that is puked

Documentation / PEP8 Good comments Inline comments Comments inline with the code Generally unnecessary (as above) Inhibit readability

Documentation Good comments Wrong comments are? When updating code, don’t forget to update?

Documentation Good comments Don’t insult the reader If they are reading your code… they aren’t that dumb Corollary: don’t comment every line!

Documentation Good comments Don’t comment every line!

Documentation Good comments Note how the block is commented The code itself reads clearly enough We used an obviously marked constant whose value is displayed if an error is encountered

Documentation / PEP8 Good comments Comments should be sentences. They should end with a period. There should be a space between the # and the first word of a comment. You should use two spaces after a sentence-ending period. (Easy for those of a certain age)

Documentation / PEP8 Good comments Comments should be written in English, and follow Strunk and White.

Documentation / PEP 0257 Docstrings String literal as the first statement in Modules Functions Classes https://www.python.org/dev/peps/pep-0257/

Documentation / PEP 0257 Docstrings They are triple quoted strings What kind of quotes to use? They can be processed by the docutils package into HTML, LaTeX, etc. for high quality code documentation (that makes you look smart). They should be phrases (end in period).

Documentation / PEP 0257 Docstrings One line doc strings are OK for simple stuff. This example (taken from PEP 0257) is crap.

Documentation / PEP 0257 Docstrings Multiline docstrings are more of the norm

Documentation / PEP 0257 Docstrings For scripts intended to be called from the command line, the docstring at the top of the file should be a usage message for the script.

Documentation / PEP 0257 Docstrings For modules and packages, list the classes, exceptions and functions (and any other objects) that are exported by the module, with a one-line summary of each. Looking at scikit learn and seaborn (as examples) this didn’t seem to be the norm. However, https://github.com/numpy/numpy/blob/master/numpy/__init__.py

Documentation / PEP 0257 Docstrings Most importantly… For functions and methods, it should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised. Example from scikit learn: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/dbscan_.py

Communcation Documentation Communication around code Project stuff Standups Technology reviews Code review Project stuff

Software Development Phases

Waterfall Process Model Software Development Why does this work poorly?

Rapid Prototyping Cannot specify all requirements in advance Software Revise the specification     Software Development

Team Activities Reqs gathering (functional spec.) Design Revise the specification Reqs gathering (functional spec.) Design Technology assessments Write specifications Review specification Implementation Code Code review Bug prioritization and resolution Standups (status update)

Code Review Template Why code review? Background Comment on Improve code quality and find bugs Background Describe what the application does Describe the role of the code being reviewed Comment on Choice of variable and function names Readability of the code How improve reuse and efficiency How use existing python packages

In class exercise Split into teams of two ~5 minutes ~10 minutes Partner A reviews B’s code Partner B reviews A’s code ~10 minutes Report back on what you learned About your code About the process ASK QUESTIONS!

In class exercise This is a safe space We are here to learn from and work with each other Compliment sandwiches taste great Follow the template and make notes  

Technology Review Template Why technology reviews? Evaluate a package for deployment in a project Background Requirements that indicate a need for the proposed package Discuss How the package works Appeal of using the package Drawbacks of using the package

Technology Review: NEXT WEEK Next Wed. every project will present Max 15 minutes – I will cut you off Everyone in the team will speak Background How it works Appeal Drawbacks Things to think about, as a starting point: Availability of relevant examples Look at open issues on GitHub Questions?

Standup Template Why standups? Should be presented in 1-2 minutes Communicate status and actions within and between teams Should be presented in 1-2 minutes Progress this period How it compares with the plan If behind plan, how compensate to make plan end date Deliverables for next period Challenges to making next deliverables such as: Technology uncertainties and blockers Team issues

Standups The week after next.. Each class will have some time for standups Everyone in class will give at least one standup These are 1 to 2 minutes, don’t prepare too much

Remainder of today… Take some time in your project team… What open questions do you have about the project process? About your project specifically? We’ll resume as a class and you can ask Jim and I for clarifications.