Research Code Andrew Rosenberg with RA Manual: Notes on Writing Code by Matthew Gentzkow and Jesse Shapiro Chicago Booth and

Slides:



Advertisements
Similar presentations
Test-First Programming. The tests should drive you to write the code, the reason you write code is to get a test to succeed, and you should only write.
Advertisements

PHP Reusing Code and Writing Functions.
Chapter 3: Modules, Hierarchy Charts, and Documentation
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Programming Types of Testing.
PS4: Test Driven Development Based on Test Driven Development by Example By Kent Beck.
Python Programming Chapter 1: The way of the program Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
Creating Shareable Models By: Eric Hutton CSDMS - Community Surface Dynamics Modeling System (pronounced ˈ s ɪ stəms) Image by Flickr user Let There Be.
CSI 101 Elements of Computing Spring 2009 Lecture #2 Development Life Cycle of a Computer Application Monday January 26th, 2009.
Version Control Systems Phil Pratt-Szeliga Fall 2010.
Applied Software Project Management Andrew Stellman & Jennifer Greene Applied Software Project Management Applied Software.
Modules, Hierarchy Charts, and Documentation
DRY. 11: DRY—Don’t Repeat Yourself Every piece of knowledge should have a single, unambiguous, authoritative representation within a system Duplication.
Open Source WGISS 39. Definition of Open Source Software (OSS)  Open source or open source software (OSS) is any computer software distributed under.
CODING Research Data Management. Research Data Management Coding When writing software or analytical code it is important that others and your future.
Programming. Software is made by programmers Computers need all kinds of software, from operating systems to applications People learn how to tell the.
Applied Software Project Management Andrew Stellman & Jennifer Greene Applied Software Project Management Applied Software.
1 Introduction to Tool chains. 2 Tool chain for the Sitara Family (but it is true for other ARM based devices as well) A tool chain is a collection of.
Version control Using Git 1Version control, using Git.
Test Driven Development TDD. Testing ”Testing can never demonstrate the absence of errors in software, only their presence” Edsger W. Dijkstra (but it.
Test Driven Development An approach to writing better code Jimmy Zimmerman Intel Corporation.
1 Design and Integration: Part 1 Nuggets about Design vs Project Management.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
1 Topics for this Lecture Software maintenance in general Source control systems (intro to svn)
Pragmatic Projects Prepared by Doug Glidden. Pragmatic Projects Pragmatic Teams Ubiquitous Automation Ruthless Testing It’s All Writing Great Expectations.
Algorithms and Programming
Phoenix Software Projects Larry Beaty © 2007 Larry Beaty. Copying and distribution of this document is permitted in any medium, provided this notice is.
Design and Programming Chapter 7 Applied Software Project Management, Stellman & Greene See also:
ITEC 370 Lecture 14 Implementation. Review Questions? Draft of design document on F Brief 3-5 minute work update on F (will continue except for mid-term)
Version control Using Git Version control, using Git1.
October, 2006 © Copyright 2006, Larry A. Beaty. Copying and distribution of this document is permitted in any medium, provided this notice is preserved.
Program documentation Using the Doxygen tool Program documentation1.
 Once the system has been installed it will be monitored to check whether it is working correctly. Sometimes problems with a system will not be found.
User Group 2015 Building A PopMedNet Community. Agenda Slide - 2 What is Open Source? Where are we today? Where should we go?
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?
Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.
Debugging Strategies from Software Carpentry. Agan's Rules Many people make debugging harder than it needs to be by: Using inadequate tools Not going.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
Developing Computer Games Testing & Documentation.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
1 Design and Integration: Part 2. 2 Plus Delta Feedback Reading and lecture repeat Ambiguous questions on quizzes Attendance quizzes Boring white lecture.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Chapter 7 The Practices: dX. 2 Outline Iterative Development Iterative Development Planning Planning Organizing the Iterations into Management Phases.
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Software Development. Software Development Loop Design  Programmers need a solid foundation before they start coding anything  Understand the task.
SEG 4110 – Advanced Software Design and Reengineering Topic T Introduction to Refactoring.
(1) Introduction to Continuous Integration Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
Continuous Improvement. Start Simple and Continually Improve E.g., Gmail Labels 1.
CSCE 240 – Intro to Software Engineering Lecture 3.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Software Development. The Software Life Cycle Encompasses all activities from initial analysis until obsolescence Analysis of problem or request Analysis.
Software Development.
Software Development.
Proposal for ONAP Development Best Practices Gildas Lanilis – ONAP Release Manager June 23 , 2017.
Version Control with Subversion
Introduction to Python
Version control, using Git
Objects First with Java
TDD adoption plan 11/20/2018.
Design and Programming
Programming Logic and Design Fourth Edition, Comprehensive
Programming.
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Software Development Chapter 1.
Presentation transcript:

Research Code Andrew Rosenberg with RA Manual: Notes on Writing Code by Matthew Gentzkow and Jesse Shapiro Chicago Booth and

Research Code has a Bad Reputation Research coding is not done with the purpose of being robust, or reusable, or long-lived in development and versioning repositories. It is usually the code’s writer who is the consumer, or in some cases a few others in the lab. make-research-software-accountable/ make-research-software-accountable/

Mistakes (Research) Programmers Make I just need to do this specific thing one time.

Mistakes (Research) Programmers Make I’ll remember what I did, if I need to do it again.

Mistakes (Research) Programmers Make No one is interested in this code.

Mistakes (Research) Programmers Make No one will ever see this code.

What research code looks like This is not application development. Often research code involves: –A series of small scripts, –linking together existing open source toolkits, –reformatting input and output, –generating plots and graphs. Where is the “software”?

What research code looks like The contribution of the paper may be –Extension of an existing codebase –a set of small scripts and reformatting one-liners. –implemented in multiple languages.

A new way of doing business These are bad excuses. There is movement to encourage and incentivize the distribution of source code with publications. And facilities to encourage it.

Source Code dissemination Host it yourself. (many, many more)

What is good enough? Right now: –ANYTHING. Ideally: –“production level” Code that can be run or compiled on a standard configuration. –Thorough documentation.

Intellectual Property and Licensing GPL –copyleft Apache many many more You have copyright over your code. A license allows someone else to use it. Disclosures can limit your ability to patent.

Version Control Version control allows multiple users to edit the same content. Allows for coding in the open. subversion, git, many more.

Version Control

Coding for the User Code for your future self. You are your most important user.

Don’t try to be clever Write simple, understandable code. Efficiency in number of lines is not important. Efficiency in number of operations or memory also might not be important.

There are many ways to skin a cat print “Just another Perl hacker,”; $_='987;s/^(d+)/$1-1/e;$1?eval:print"Just another Perl hacker,"';eval; $_ = "wftedskaebjgdpjgidbsmnjgc"; tr/a-z/oh, turtleneck Phrase Jar!/; print;

Establish a coding style. ClassName nameMethodsUsingVerbs underscored_lowercase_variable_names CONSTANTS Spacing –x_mean=x_total/n More than anything, be consistent

Testing Unit tests. –Small pieces of code that test “atomic” functionality of a program. void testAddWorksCorrectly() { assertEquals(4, add(2,2)); } void testConstructorInitializesNameFieldToDefault() { Person p = new Person() assertEquals(“John Smith”, p.getName()); }

Why write tests? Identify problems. Easier Changes. Simple integration. Documentation.

Test Driven Development Write a Test Run tests to see if it fails Write as little code as possible Make the tests pass (go green) Refactor code Repeat [wikipedia]

Bug fixes and Testing When you find a bug in your code. Write a test that “catches the bug”. –It fails. The bug is fixed when the test passes. And it’ll never happen again.

Refactoring Just because code works, it doesn’t mean it’s done. Consolidate code to increase modularity –Eliminate code duplication. Some examples –Extract Classes –Extract Method –Move/Rename Method

Code Review Give your code to another person for feedback. Companies do this to ensure consistent style and correctness. Research labs rarely do.

Some specific advice. Take an enormous amount of notes. –What did you do? –What did you learn? –What bugs did you fix? –What new issues did you find? –What questions did you come up with?

Specifics Copy and Paste is your enemy. –If you are copying and pasting in code, you have probably made a mistake.

Specifics Use CONSTANTS –Never encode constants inline in your code. mean_height = total_height / 15 num_people = 13 mean_height = total_height / num_people

Specifics Use CONSTANTS –Never encode constants inline in your code. data[17] = ‘Andrew’ data[18] = 1.78 name_idx = 17 score_idx = 18 data[name_idx] = ‘Andrew’ data[score_idx] = 1.78

Specifics Don’t use global variables

Specifics Use sensible function names start() step1() step2() step3() wrapup()

Specifics Use sensible function names initializeParameters() setPaths() calculateRHS() calculateLHS() writeResults()

Specifics Use sensible variable names x1 = income / population ipc = income / population income_per_capita = income / population

Specifics Serialize Frequently. main() { preprocessData() extractFeatures() runBaselineExperiment() runNewExperiment() evaluateResults() }

Specifics Serialize Frequently. preprocess files.data > clean_files.data extractFeatures clean_files.data > features.csv runBaseline features.csv > baseline.results runNewExperiment features.csv > new.results evaluate baseline.results > baseline.report evaluate new.results > new.report

Specifics When things get slow, use a profiler. –Identify slow functions, and fix them. –Some code needs to do a lot, so it can be slow

Recap Research Code should be released –This is becoming more common, expected and, sometimes, required. Research Code needs to be good code. –So you can reuse it. –So you can release it.