Nick Barnes at AMS, climatecode.org1 Better Science with Python Copyright Climate Code Foundation, license CC-BY
Nick Barnes at AMS, climatecode.org2 What is the CCF? A UK non-profit founded in 2010; “to promote the public understanding of climate science…” … through software activities. Continuing projects started in 2008; A few software consultants, currently unpaid part-time; Advisory committee of a dozen experts; A growing network of climate scientists.
Nick Barnes at AMS, climatecode.org3 What is the problem? Scientists have to write code, but: They aren’t well-trained; They aren’t properly rewarded; There is no incentive to publish it. So science code looks like the industry 30 years ago: No version control or configuration management; No issue systems or defect tracking; No automated testing or test-driven development. Critically: code is being written for computers, not people.
Nick Barnes at AMS, climatecode.org4 Clear Climate Code Project started in Over-riding goal is clarity: code which interested members of the public can download, run, read and understand. Open-source, of course. First target NASA GISTEMP: ccc-gistemp.googlecode.com 12 KLOC of Fortran (etc). became 3678 lines of Python (including 1500 of docstrings) fixed minor bugs. fosters new science: one paper out now, more draft.
Nick Barnes at AMS, climatecode.org5 Why clarity? Original motivation was to answer critics: Not the real code; Can’t be run; Contains “obvious bugs”; “divinci code written by the shortbus crew.” But also a key message of software engineering: Your target audience is people, not compilers Those people are, most often, yourselves.
Nick Barnes at AMS, climatecode.org6 What is clarity? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record
Nick Barnes at AMS, climatecode.org7 Clear how? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record
Nick Barnes at AMS, climatecode.org8 Clear to whom? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record
Nick Barnes at AMS, climatecode.org9 Unclear how? def step1(record_source): """An iterator for step 1. Produces a stream of `giss_data.Series` instances. :Param record_source: An iterable source of `giss_data.Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record
Nick Barnes at AMS, climatecode.org10 Unclear how? for m in range(12): sum_new = 0.0 # Sum of data in new sum = 0.0 # Sum of data in average count = 0 # Number of years where both new and average are valid for a,n in itertools.izip(average[first_year*12+m: last_year*12: 12], new[first_year*12+m: last_year*12: 12]): if invalid(a) or invalid(n): continue count += 1 sum += a sum_new += n if count < min_overlap: continue bias = (sum-sum_new)/count
Nick Barnes at AMS, climatecode.org11 Clarity enables new science By promoting “computational thinking” (Wing, NSF), Clear code raises new questions… Airport-only trends? Effect of US data? Effect of restricting to long-record stations? Use of land data for ocean cells? Adding more data scraped from met sites? …and helps answer them… …for both original authors and others.
Nick Barnes at AMS, climatecode.org12 Why Python? Syntax: Very small and simple core language; Clear syntax (compared with Perl, C++, Fortran, etc); Indentation for blocks (huge win although often derided); No type declarations or decorations; Semantics: Garbage collection: no code for memory management; First-class functions. “Duck-typing” for maximum code flexibility and re-use; A simple object system; Library (“batteries included”): A huge amount of useful functionality; Kept out of the way of the core language: explicit import; Great documentation; One great way to do it (not TMTOWTDI).
Nick Barnes at AMS, climatecode.org13 Wait, there’s more: Open-source: Zero cost; No licensing trap, for you or your audience; Future-proof. “Interpreted” (i.e. has a really good REPL); Long-lived and stable; Very portable (and easy to install); Easy interfaces to other languages and systems; Terrific eco-system; A BDFL who is right much more often than he is wrong; And probably more.
Nick Barnes at AMS, climatecode.org14 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!).
Nick Barnes at AMS, climatecode.org15 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution?
Nick Barnes at AMS, climatecode.org16 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution? Of Python 3?
Nick Barnes at AMS, climatecode.org17 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution? Of Python 3? Committed to Compatibility.
Nick Barnes at AMS, climatecode.org18 So: Why not Python? Performance; Concurrency; Many things not in the library (and may never be); … so there’s more than one way to do it! Package management (TMTOWTDI!); Some unpleasant corners **kwargs, old-style classes); 2 vs 3; Stability not as good as traditional languages; Language direction: (e.g. lambda deprecated!). Use a distribution? Of Python 3? Committed to Compatibility. With a new implementation?
Nick Barnes at AMS, climatecode.org19 A great language is just the start Vital software development skills and tools: Version control; Defect tracking; Code inspection; Automated testing; Automated building; Bundling and delivery; Documentation; Team-work; Publication. Many free integrated suites of tools, online and offline. Beware: “You can write FORTRAN in any language.”
Nick Barnes at AMS, climatecode.org20 Google Summer of Code Google pays students to write code ($5000 for 3 months); Any open-source project; Our 2011 projects: Hannah Aizenman:Common Climate Project; Filipe Fernandes:Extensions to ccc-gistemp; Daniel Rothenberg:Homogenization; (these names might look familiar if you were here yesterday). 2012? Program to be announced soon (late Jan); we hope to be accepted as a mentoring org (March); then we will welcome student proposals, or collaborations with scientists.
Nick Barnes at AMS, climatecode.org21 Open Science Accelerating trend towards more openness in science. Redefining publication: Open Access; Open Data; Open Knowledge; Open Notebooks; Data-driven intelligence; Workshops, conferences, summits; There’s a war on: PRISM, RWA; Policy studies at AAAS, NSF, Royal Society, etc; But no coherent message about open software in science. Michael Nielsen: Reinventing Discovery
Nick Barnes at AMS, sciencecodemanifesto.org22 Science Code Manifesto Code:All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper. Copyright: The copyright ownership and license of any released source code must be clearly stated. Citation:Researchers who use or adapt science source code in their research must credit the code's creators in resulting publications. Credit:Software contributions must be included in systems of scientific assessment, credit, and recognition. Curation:Source code must remain available, linked to related materials, for the useful lifetime of the publication.
Nick Barnes at AMS, climatecode.org23 Future Plans Changing policies: Transparency; Rewards for all research products. Training scientists: Basic techniques (testing, version control, agile, etc); Code publication and reuse. Providing resources: White papers, blog posts; Directories. Building networks, partnering with institutions; Leading by example: ccc-gistemp; ccf-homogenization; etc….
Nick Barnes at AMS, climatecode.org24 Questions?
Nick Barnes at AMS, climatecode.org25 Funding I say "non-profit". Approximately “non-revenue". All accounts open. Total revenue to date£ (+ GSoC students). Total costs to date£ (as of ). All work unpaid (not counting GSoC students). Personal lost income to date probably £30-40K. Funding model seeks £150K-£500K annually from corporate or NGO sponsorship (plus some project money from academic collaborations). Too much? Not enough? Depends who you ask. Open to suggestions!