Nick Barnes at AMS, 2012-01-24climatecode.org1 Better Science with Python Copyright Climate Code Foundation, license CC-BY.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

1 Information Online 2009 Rights management – does copyright still matter in the 21st century? 20 January, 2009 Caroline Morgan General Manager, Corporate.
Intelligent Grid Solutions 1 / 18 Convergence of Grid and Web technologies Alexander Wöhrer und Peter Brezany Institute for Software.
1 Adaptive Management Portal April
© 2002 IBM Corporation Enablement of Moodle software to DB2 9.7 Raul F. Chong IBM Canada Mario BriggsIBM
Computer Science 162 Section 1 CS162 Teaching Staff.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
Zocalo Developing Open Source Prediction Market Software Chris Hibbert CommerceNet Labs DIMACS Workshop on Markets as Predictive Devices 4 February 2005.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
Kate Gregory | Gregory Consulting James McNellis | Senior Engineer, Visual C++
Approaches to ---Testing Software Some of us “hope” that our software works as opposed to “ensuring” that our software works? Why? Just foolish Lazy Believe.
> taking best practice to the world International Experience with Performance Based Maintenance Contracts.
This chapter is extracted from Sommerville’s slides. Text book chapter
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Enhancing Geoscience Education at Minority-Serving Institutions AMS Diversity Projects Dr. James Brey Director, Education Program | American Meteorological.
Guide to the Software Engineering Body of Knowledge Chapter 1 - Introduction.
Cmpe 589 Spring Software Quality Metrics Product  product attributes –Size, complexity, design features, performance, quality level Process  Used.
Open Source Software An Introduction. The Creation of Software l As you know, programmers create the software that we use l What you may not understand.
The DSpace Course Module – An introduction to DSpace.
A Skeptic’s View of Open Access Michael Held The Rockefeller University Press UKSG Conference March 30, 2004.
The Python Language Petr Přikryl Part I Socrates IP, 15th June 2004 TU of Brno, FIT, Czech Republic.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
PBL in Team Applied to Software Engineering Education Liubo Ouyang Software School, Hunan University CEIS-SIOE, January 2006, Harbin.
Trends in Corporate Governance Dr. Sandra B. Richtermeyer, CMA, CPA President, Institute of Management Accountants (IMA) June 21, 2011.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Evaluating Web Resources Hosted by Lee Anne Morris.
EASI a free web database application for collecting and managing monitoring records.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Academic Scientist Kenneth Ruud Prorector for research and development.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Funding your Dreams Cathy Manduca Director, Science Education Resource Center Iowa State University, 2005.
9 March 06 Who is going to read my (NSF) proposal? brief remarks to the WHOI Postdoctoral Association Jim Price Writing a better.
Policies of the major countries of the world concerning implementation of integrated science and technology information networks International Workshop.
Nick Barnes at NCDC, climatecode.org1 Better Science Through Software Copyright Climate Code Foundation, license CC-BY.
Software Sustainability Institute Software Attribution can we improve the reusability and sustainability of scientific software?
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
How to Publish Your Code on COIN-OR Bob Fourer Industrial Engineering & Management Sciences Northwestern University COIN Strategic Leadership Board.
The Climate Code Foundation Software for Climate Science Nick Barnes talk at Google, climatecode.org.
Standards Certification Education & Training Publishing Conferences & Exhibits ISA Publications & Standards The International Society of Automation.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
The Climate Code Foundation Software for Climate Science Nick Barnes talk at Google NYC, climatecode.org.
SoundSoftware.ac.uk: Towards Reusable Software for Audio & Music Research Mark Plumbley, Chris Cannam and Luis Figueira Centre for Digital Music Queen.
3/30/04 16:14 1 Lessons Learned CERES Data Management Presented to GIST 21 “If the 3 laws of climate are calibrate, calibrate, calibrate, then the 3 laws.
Nick Barnes at UKMO, climatecode.org1 Better Science Through Software Copyright Climate Code Foundation, license CC-BY.
Frankfurt (Germany), 6-9 June 2011 SmartLife Guillaume & SmartLife Core Group – France – S1 – Paper SmartLife initiative in Focus.
Workshop: RIA for Prime Ministry Experts 13 October 2009 EuropeAid/125317/D/SER/TR Session 3 RIA Consultation for Public Sector and Government.
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
 Programming - the process of creating computer programs.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Office of Science Statement on Digital Data Management Laura Biven, PhD Senior Science and Technology Advisor Office of the Deputy Director for Science.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
What is the CCF? A non-profit founded in 2010, based in the UK; Continuing projects started in 2008; A few software consultants, currently unpaid part-time;
R&D Operation Best Practice for Start Up Start a Business And Change the world Alfred Boediman, Ph.D.
What is ? Open access definition: Image source:
RCUK Policy on Open Access Name Job title Research Councils UK.
Chapter 8: Maintenance and Software Evolution Ronald J. Leach Copyright Ronald J. Leach, 1997, 2009, 2014,
Research Skills for Your Essay Where to begin…. Starting the search task for real Finding and selecting the best resources are the key to any project.
Language Technologies Institute Carnegie Mellon University
Chapter 18 Maintaining Information Systems
KIOS Open Knowledge: A pillar for excellence
Introduction to Data Programming
What is open source? Computer software where the source code is distributed under an open source license that allows anyone to study, change, improve.
What is open source? Computer software where the source code is distributed under an open source license that allows anyone to study, change, improve.
Helping a friend out Guidelines for better software
Chapter 7 –Implementation Issues
What is open source? Computer software where the source code is distributed under an open source license that allows anyone to study, change, improve.
DATA MINING Python.
Presentation transcript:

Nick Barnes at AMS, climatecode.org1 Better Science with Python Copyright Climate Code Foundation, license CC-BY

Nick Barnes at AMS, climatecode.org2 What is the CCF? A UK non-profit founded in 2010; “to promote the public understanding of climate science…” … through software activities. Continuing projects started in 2008; A few software consultants, currently unpaid part-time; Advisory committee of a dozen experts; A growing network of climate scientists.

Nick Barnes at AMS, climatecode.org3 What is the problem? Scientists have to write code, but: They aren’t well-trained; They aren’t properly rewarded; There is no incentive to publish it. So science code looks like the industry 30 years ago: No version control or configuration management; No issue systems or defect tracking; No automated testing or test-driven development. Critically: code is being written for computers, not people.

Nick Barnes at AMS, climatecode.org4 Clear Climate Code Project started in Over-riding goal is clarity: code which interested members of the public can download, run, read and understand. Open-source, of course. First target NASA GISTEMP: ccc-gistemp.googlecode.com 12 KLOC of Fortran (etc). became 3678 lines of Python (including 1500 of docstrings) fixed minor bugs. fosters new science: one paper out now, more draft.

Nick Barnes at AMS, climatecode.org5 Why clarity? Original motivation was to answer critics: Not the real code; Can’t be run; Contains “obvious bugs”; “divinci code written by the shortbus crew.” But also a key message of software engineering: Your target audience is people, not compilers Those people are, most often, yourselves.

Nick Barnes at AMS, climatecode.org6 What is clarity? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at AMS, climatecode.org7 Clear how? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at AMS, climatecode.org8 Clear to whom? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at AMS, climatecode.org9 Unclear how? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record

Nick Barnes at AMS, climatecode.org10 Unclear how? for m in range(12): sum_new = 0.0 # Sum of data in new sum = 0.0 # Sum of data in average count = 0 # Number of years where both new and average are valid for a,n in itertools.izip(average[first_year*12+m: last_year*12: 12], new[first_year*12+m: last_year*12: 12]): if invalid(a) or invalid(n): continue count += 1 sum += a sum_new += n if count < min_overlap: continue bias = (sum-sum_new)/count

Nick Barnes at AMS, climatecode.org11 Clarity enables new science By promoting “computational thinking” (Wing, NSF), Clear code raises new questions… Airport-only trends? Effect of US data? Effect of restricting to long-record stations? Use of land data for ocean cells? Adding more data scraped from met sites? …and helps answer them… …for both original authors and others.

Nick Barnes at AMS, climatecode.org12 Why Python? Syntax: Very small and simple core language; Clear syntax (compared with Perl, C++, Fortran, etc); Indentation for blocks (huge win although often derided); No type declarations or decorations; Semantics: Garbage collection: no code for memory management; First-class functions. “Duck-typing” for maximum code flexibility and re-use; A simple object system; Library (“batteries included”): A huge amount of useful functionality; Kept out of the way of the core language: explicit import; Great documentation; One great way to do it (not TMTOWTDI).

Nick Barnes at AMS, climatecode.org13 Wait, there’s more: Open-source: Zero cost; No licensing trap, for you or your audience; Future-proof. “Interpreted” (i.e. has a really good REPL); Long-lived and stable; Very portable (and easy to install); Easy interfaces to other languages and systems; Terrific eco-system; A BDFL who is right much more often than he is wrong; And probably more.

Nick Barnes at AMS, climatecode.org14 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!).

Nick Barnes at AMS, climatecode.org15 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution?

Nick Barnes at AMS, climatecode.org16 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution? Of Python 3?

Nick Barnes at AMS, climatecode.org17 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution? Of Python 3? Committed to Compatibility.

Nick Barnes at AMS, climatecode.org18 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution? Of Python 3? Committed to Compatibility. With a new implementation?

Nick Barnes at AMS, climatecode.org19 A great language is just the start Vital software development skills and tools: Version control; Defect tracking; Code inspection; Automated testing; Automated building; Bundling and delivery; Documentation; Team-work; Publication. Many free integrated suites of tools, online and offline. Beware: “You can write FORTRAN in any language.”

Nick Barnes at AMS, climatecode.org20 Google Summer of Code Google pays students to write code ($5000 for 3 months); Any open-source project; Our 2011 projects: Hannah Aizenman:Common Climate Project; Filipe Fernandes:Extensions to ccc-gistemp; Daniel Rothenberg:Homogenization; (these names might look familiar if you were here yesterday). 2012? Program to be announced soon (late Jan); we hope to be accepted as a mentoring org (March); then we will welcome student proposals, or collaborations with scientists.

Nick Barnes at AMS, climatecode.org21 Open Science Accelerating trend towards more openness in science. Redefining publication: Open Access; Open Data; Open Knowledge; Open Notebooks; Data-driven intelligence; Workshops, conferences, summits; There’s a war on: PRISM, RWA; Policy studies at AAAS, NSF, Royal Society, etc; But no coherent message about open software in science. Michael Nielsen: Reinventing Discovery

Nick Barnes at AMS, sciencecodemanifesto.org22 Science Code Manifesto Code:All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper. Copyright: The copyright ownership and license of any released source code must be clearly stated. Citation:Researchers who use or adapt science source code in their research must credit the code's creators in resulting publications. Credit:Software contributions must be included in systems of scientific assessment, credit, and recognition. Curation:Source code must remain available, linked to related materials, for the useful lifetime of the publication.

Nick Barnes at AMS, climatecode.org23 Future Plans Changing policies: Transparency; Rewards for all research products. Training scientists: Basic techniques (testing, version control, agile, etc); Code publication and reuse. Providing resources: White papers, blog posts; Directories. Building networks, partnering with institutions; Leading by example: ccc-gistemp; ccf-homogenization; etc….

Nick Barnes at AMS, climatecode.org24 Questions?

Nick Barnes at AMS, climatecode.org25 Funding I say "non-profit". Approximately “non-revenue". All accounts open. Total revenue to date£ (+ GSoC students). Total costs to date£ (as of ). All work unpaid (not counting GSoC students). Personal lost income to date probably £30-40K. Funding model seeks £150K-£500K annually from corporate or NGO sponsorship (plus some project money from academic collaborations). Too much? Not enough? Depends who you ask. Open to suggestions!