Download presentation
Presentation is loading. Please wait.
1
Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006
2
2 Communicating Scientific Thought and Data Oxford March, 2006 Outline Intro to BADC Drivers for Change: –Open Access –Data as Evidence Handling Data – what is this metadata stuff? Improving our ability to find and utilise data: –The NERC DataGrid –Importance of Climate Forecast Conventions –NumSim Are we changing our methods of communicating? –Communication Timescales, blogging and preprints –CLADDIER
3
3 Communicating Scientific Thought and Data Oxford March, 2006 BADC Role The BADC role is to assist UK researchers to locate, access and interpret atmospheric data and to ensure the long-term integrity of atmospheric data produced by NERC projects. –Facilitation and Curation/Preservation!
4
4 Communicating Scientific Thought and Data Oxford March, 2006 BADC Data Holdings A BADC dataset is an aggregation of data files, documents and metadata sharing common administrative policies. These policies could be file validation, access control or retention schemes. Datasets vary from TBs in millions of files to a few MBs in a single file. There are presently over 100 datasets.
5
5 Communicating Scientific Thought and Data Oxford March, 2006 User examples Atmospheric chemistry models. Pollution chemistry measurement campaigns.
6
6 Communicating Scientific Thought and Data Oxford March, 2006 User examples Bird feeding habits.
7
7 Communicating Scientific Thought and Data Oxford March, 2006 User examples Radio communication modelling. Wind power research. A & E influenza cases.
8
8 Communicating Scientific Thought and Data Oxford March, 2006 User examples Castle mortar decay. Discomfort indices.
9
Drivers for Change
10
10 Communicating Scientific Thought and Data Oxford March, 2006 Climate in 20010 – A graphic Illustration Figures from Gary Strand, NCAR, ESG website March 2006, 2.5 PB Typically, two-thirds of this data will never see the light of the day: why? No one can remember what it was, or, if they can remember that, where it is!
11
11 Communicating Scientific Thought and Data Oxford March, 2006 http://www.realclimate.org/index.php?p=121 Data as Evidence http://www.uoguelph.ca/~rmckitri/research/trcback.html What McIntyre got right:
12
12 Communicating Scientific Thought and Data Oxford March, 2006 RCUK Position Statement on Access to Research Outputs http://www.rcuk.ac.uk/access/statement.pdf Research Data 8. RCUK also notes that one of the benefits of digitisation and publication in digital formats is the ability to provide access to primary research data alongside the traditional article; and it shares the Select Committee’s and the Government’s view that the data underpinning the published results of publicly-funded research should be made available as widely and rapidly as possible. For a number of years, Research Councils including the AHRB, ESRC and NERC have funded data centres and services which are responsible for preserving, managing and providing access to research data; and these Councils have well-established policies and procedures for preservation and access. CCLRC is currently leading cross-Council consideration of how policy and practice need to be developed with regard to the curation of the data created through the research projects they support. Further work is needed to develop a common framework of policies and procedures for determining what sets of data are collected, whether in university or in Council-run repositories or elsewhere; and how and on what terms they are made accessible to the research community and others New methods of publication 9. The development of web and associated Internet technologies providing access to a range of distributed information resources has enabled new possibilities for the delivery of research publications. This has also led to a change in expectations as to how and when research publications are accessed. E- print repositories (see paragraphs 10-15 below) and open access journals (see paragraphs 25-27 below) have both developed as part of this change in technology and expectation. Indeed, the economic model for open access journals depends on the web to provide a low-cost delivery mechanism. RCUK considers that both e-print repositories and open access journal can help improve access to the results of publicly funded research.
13
13 Communicating Scientific Thought and Data Oxford March, 2006 Data Retention Policies University of Cambridge Research Division: Data generated in the course of research should be kept securely in paper or electronic format, as appropriate. Back-up records should always be kept for data stored on a computer. The [AMRC] considers a minimum of ten years to be an appropriate period. However, research based on clinical samples or relating to public health may require longer storage to allow for long-term follow-up to occur. [AMRC: Association of Medical Research Charities] University of Oxford Research Services Office: A successful laboratory notebook allows for ready verification of quality and integrity of research data and enables another investigator to reproduce the procedure which has been documented and get the same result. …. A successful laboratory notebook allows for ready verification of quality and integrity of research data and enables another investigator to reproduce the procedure which has been documented and get the same result. Natural Environment Research Council: … Scientists will frequently process the data they have collected selectively, or with specific application packages, in order to prepare material for publication in the scientific literature. But the full value of the data collected may only be realised if the entire dataset is subjected to generic processing (eg to ensure calibration and adequate quality control) and is sufficiently documented to allow others to re-use it at a later date. The original collector may be the only person in a position to undertake such work, and so to unlock the full potential of the data. Those holding data collected under NERC funding will be expected to cooperate in validating and publishing them in their entirety - when this can be justified in terms of their scientific value - rather than merely creaming off a subset for immediate publication in the literature. …
14
What is this stuff called metadata?
15
15 Communicating Scientific Thought and Data Oxford March, 2006 Preserving data is not just about backups! One could argue that the writers of these documents did a brilliant job of preserving the bits-and- bytes of their time … And yes they’ve both been translated … many times, it’s a shame the meanings are different … Phaistos Disk, 1700BC
16
16 Communicating Scientific Thought and Data Oxford March, 2006 Wider Internet Research Group Satellite SuperComputer Shared Resources DB Research Group Metadata Origins Consider a hierarchy of data users beginning with an individual scientist, who may herself be part of a research group, itself part of a community sharing resources, lying in the wider internet … To be well integrated the metadata should have a role at each level! (The data portal client and server interface may be different at each level). At each level “extra” metadata will be required, probably produced by dedicated staff at the research group, or data centre.
17
17 Communicating Scientific Thought and Data Oxford March, 2006 NDG Metadata Taxonomy … not one schema, not one solution! CSML NCML+CF MOLES THREDDS DIF -> ISO19115 CLADDIER
18
The NERC DataGrid
19
19 Communicating Scientific Thought and Data Oxford March, 2006 http://ndg.nerc.ac.uk British Atmospheric Data Centre British Oceanographic Data Centre Complexity + Volume + Remote Access = Grid Challenge NCAR
20
20 Communicating Scientific Thought and Data Oxford March, 2006
21
21 Communicating Scientific Thought and Data Oxford March, 2006 e.g.: ERA40 re-analysis surface air temperature, 2001-04-27 –deegree open-source WMS modified with netCDF connector Overlaid with rainfall from globe.digitalearth.gov WMS server NetCDF + WMS NB: Now using Mapserver for Interoperability experiments
22
22 Communicating Scientific Thought and Data Oxford March, 2006 Climate Science Modelling Language CSML feature types –defined on basis of geometric and topologic structure CSML feature type DescriptionExamples TrajectoryFeature Discrete path in time and space of a platform or instrument. ship’s cruise track, aircraft’s flight path PointFeatureSingle point measurement.raingauge measurement ProfileFeature Single ‘profile’ of some parameter along a directed line in space. wind sounding, XBT, CTD, radiosonde GridFeatureSingle time-snapshot of a gridded field.gridded analysis field PointSeriesFeatureSeries of single datum measurements. tidegauge, rainfall timeseries ProfileSeriesFeatureSeries of profile-type measurements. vertical or scanning radar, shipborne ADCP, thermistor chain timeseries GridSeriesFeatureTimeseries of gridded parameter fields. numerical weather prediction model, ocean general circulation model
23
23 Communicating Scientific Thought and Data Oxford March, 2006 Climate Science Modelling Language CSML feature types –examples... ProfileSeriesFeature ProfileFeature GridFeature
24
24 Communicating Scientific Thought and Data Oxford March, 2006 Climate Science Modelling Language Application schema –logical structure and semantic content of NDG ‘Dataset’ –Based on Geography Markup Language 3.1
25
25 Communicating Scientific Thought and Data Oxford March, 2006 Climate Science Modelling Language Numerical array descriptors –provides ‘wrapper’ architecture for legacy data files –‘Connected’ to data model numerical content through ‘xlink:href’ Three subtypes: –InlineArray –ArrayGenerator –FileExtract (NASAAmes, NetCDF, GRIB) Composite design pattern for aggregation
26
26 Communicating Scientific Thought and Data Oxford March, 2006 Climate Science Modelling Language Inline array Array generator 5 2 udunits.xml#degreeC float s/10/9/ge +5 1 2 3 4 5 6 7 8 9 10 10001 udunits.xml#minute float 0:5:50000
27
27 Communicating Scientific Thought and Data Oxford March, 2006 Climate Science Modelling Language File extract 526 double /data/BADC/macehead/mh960606.cf1 CFC-12 10000 radar_data.nc az 320 160 double /e40/ggas1992010100rsn.grb 203 5 289412
28
28 Communicating Scientific Thought and Data Oxford March, 2006 XML Parser SeeMyDENC Data Dictionary S52 Portrayal Library SENC Marine GML (NDG) Feature Types XML Biological Species Chl-a from Satellite Modelled Hydrodynamics XSLT For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘MarineXML registry’ The FTs can then be translated to equivalent FTs for display in the ECDIS system XSLT Features in the source XSD must be present in the data dictionary. XSD XML The result of the translation is an encoding that contains the marine data in weakly typed (i.e. generic) Features XSLT Phenomena in the XSD must have an associated portrayal ECDIS acts as an example client for the data. Data from different parts of the marine community conforming to a variety of schema (XSD) Measured Hydrodynamics S-57v3 GML XML XSD XML XSD Feature described using S-57v3.1Application Schema can be imported and are equivalent to the same features in CSML’ Slide adapted from Kieran Millard (AUKEGGS, 2005) MarineXML Testbed
29
29 Communicating Scientific Thought and Data Oxford March, 2006 Biological sampling station with attributes for the species sampled at each Grid of Chl-a from the MERIS instrument on ENVISAT Predicted and measured wave climate timeseries (height, direction and period) Vectors of currents from instruments MarineXML Testbed Slide adapted from Kieran Millard (AUKEGGS, 2005)
30
30 Communicating Scientific Thought and Data Oxford March, 2006 Re-using Features Here structured XML is converted to plain ascii text in the form required for a numerical model HTML warning service pages are generated ‘on the fly’ XML can also be converted to SVG to display data graphically Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts. All this requires agreement on standards Slide adapted from Kieran Millard (AUKEGGS, 2005)
31
31 Communicating Scientific Thought and Data Oxford March, 2006 Climate Science Modelling Language Status: –Initial feature types defined –First draft application schema complete –Trial software tooling being coded (parser, netCDF instantiation) –Initial deployment trial across BODC, BADC datasets Future: –Separate out wrapper implementation (array descriptors) –Disallow ‘internal’ dictionaries –More strongly-typed features? Complex features Implicit Ensemble Support Swathes –Follow (and pursue!) GML evolution, enhance compliance –Expand tooling
32
32 Communicating Scientific Thought and Data Oxford March, 2006 CSML Round Tripping Managing semantics UGAS GML app schema XML 55.25 6.5 GML dataset instance conceptual model Conforms to 101 010 New Dataset Application produces parser Under Development
33
33 Communicating Scientific Thought and Data Oxford March, 2006 CSML Round Tripping Managing data - 1 parser Under Development 55.25 6.5 GML dataset scanner Under Development GML app schema XML instance 101 010 CF Dataset Application produces CF
34
34 Communicating Scientific Thought and Data Oxford March, 2006 Climate Forecasting Conventions 1.Data should be self-describing. No external tables are needed to interpret the file. For instance, CF encodings do not depend upon numeric codes (by contrast with GRIB). 2.The convention should be easy to use, both for data- writers and users of data. 3.The metadata and the semantic meaning encoding though the metadata should be readable by humans as well as easily utilized by programs. 4.Redundancy should be minimised as much as possible (because it reduces the chance of errors of inconsistency when writing data)
35
35 Communicating Scientific Thought and Data Oxford March, 2006 CF CF consists of: Vocabulary management Semantic concepts (axes, cells etc), and format specific conventions (NetCDF now) CF is at the heart of IPCC data comparison Academic earth system science data exploitation (and archival).
36
36 Communicating Scientific Thought and Data Oxford March, 2006 CF Exploits 100’s of man years of effort on NetCDF evolution and tools Is one of the means by which we can take NetCDF data and make meaningful feature types. Helps future proof your data!
37
37 Communicating Scientific Thought and Data Oxford March, 2006 Managing Data 2 101 010 CF Dataset 55.25 6.5 GML dataset scanner XSLT ISO19115 XML PUBLISH DECISION PROCESSES 101 010 CF Dataset Define Dataset Add Information
38
38 Communicating Scientific Thought and Data Oxford March, 2006 http://ndg.nerc.ac.uk/discovery
39
39 Communicating Scientific Thought and Data Oxford March, 2006 Choose to return either data or “B-”Metadata Look at DIFs in either HTML or XML Can order responses by Title, Data Centre or Temporal coverage (default random)
40
40 Communicating Scientific Thought and Data Oxford March, 2006
41
41 Communicating Scientific Thought and Data Oxford March, 2006
42
42 Communicating Scientific Thought and Data Oxford March, 2006
43
43 Communicating Scientific Thought and Data Oxford March, 2006
44
44 Communicating Scientific Thought and Data Oxford March, 2006
45
45 Communicating Scientific Thought and Data Oxford March, 2006
46
46 Communicating Scientific Thought and Data Oxford March, 2006
47
47 Communicating Scientific Thought and Data Oxford March, 2006
48
48 Communicating Scientific Thought and Data Oxford March, 2006 Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools) Download either plot or the data that went into the plot.
49
49 Communicating Scientific Thought and Data Oxford March, 2006 ERA40: All driven from one CDML file, 9 TB online spherical harmonics, looking like 40 TB “virtual” gridded!
50
50 Communicating Scientific Thought and Data Oxford March, 2006 NumSim.xsd http://proj.badc.rl.ac.uk/ndg/wiki/NumSim See also: http://www.cgam.nerc.ac.uk/pmwiki/NMM/index.php/8
51
Changing Communications Blogging, Trackback and CLADDIER
52
52 Communicating Scientific Thought and Data Oxford March, 2006 Blogging Wednesday 15 th of March: Google search on “climate blogs” yields 33,900,000 hits. www.technorati.com is following 30 million blogswww.technorati.com –269,404 have climate posts –1,953 climate posts in “environmental” blogs –131 posts about potential vorticity (mainly in weather/hurricane blogs) Very few “professional” standard blogs in our field, but gazillions in others!: –Notwithstanding: http://www.realclimate.org and othershttp://www.realclimate.org
53
53 Communicating Scientific Thought and Data Oxford March, 2006 Traditional Scientific Publishing Pluses: “Peer-Review”; the gold standard Copy-editing Reliable indexing (Web of Science etc) Paper is nice to read. Minuses: “Peer-Review”; “support your mates” Often (very) slow to “print” Proprietary indexing (Role on Google-Scholar) Libraries can’t afford to buy copies! Limited Readership. SELF PUBLISHING Pluses: No Peer Review: say what you think, citation and annotation measure quality! Feedback: comments and trackback. Hyperlinks to publications AND data. Immediacy Reliable Accessible Indexing You can print things out to read … You can still publish in the traditional media (while it lasts). Minuses: No Peer Review: plenty of garbage. Spam. Conclusion: how can we do peer review without traditional journals? Because the days of traditional journals (apart from as formal records) are numbered!
54
54 Communicating Scientific Thought and Data Oxford March, 2006 What is trackback? Hyperlinks forward in time! If a web resource (paper, page, data) is configured correctly, software is able to accept trackback “pings” and update that web resource with annotations. One such annotation type is effectively a citation: –“I ( ) have cited with something with this found at this ) –Real Time citation of a resource so that it shows what people have said *after* it has been published! –(Some blogging providers do it automatically, using search engines to find all links and enter them appropriately) BNL just joined a working group to “standardise” trackback, and I’ll be working to make sure the format includes Academic Citation.
55
55 Communicating Scientific Thought and Data Oxford March, 2006 Trackback Example
56
56 Communicating Scientific Thought and Data Oxford March, 2006 Institutional Repositories E-print repositories (from the RCUK document cited earlier) –For the purpose of this document, e-print repositories8 are always understood to open access. RCUK believes that such institutional and subject-based repositories, where researchers deposit copies of the articles they publish (ie post-print), provide an opportunity significantly to enhance access to research publications … Importantly, there is a small but growing body of evidence demonstrating the increased impact and visibility of material made available in open access through e-print repositories. (ignoring issues for the publishers for the moment) RCUK further recomends: –Where research is funded by the Research Councils and undertaken by researchers with access to an open access e-print repository (institutional or subject-based), Councils will make it a condition for all grants awarded from 1 October 2005 that a copy of all resultant published journal articles or conference proceedings (but not necessarily the underlying data) should be deposited in and/or accessible through that repository, subject to copyright or licensing arrangements … Such deposit requires relatively little effort and, for each published paper, should not take more than 15- 20 minutes of an author’s or repository manager’s time. There is no reason why this should be seen as an infringement of researchers’ freedom …
57
57 Communicating Scientific Thought and Data Oxford March, 2006 IR Examples
58
58 Communicating Scientific Thought and Data Oxford March, 2006 CLADDIER
59
59 Communicating Scientific Thought and Data Oxford March, 2006 CLADDIER Use Case Sequence: 1.Joanna reads paper 2.Joanna acquires data 3.Joanna analyses data 4.Joanna deposits data Data Centre generates trackbacks to cited data and papers (in the metadata) 5.Joanna creates paper 6.Joanna deposits paper Institutional Repository generates trackbacks to cited data and papers 7.Fred reads Joanna’s new paper 8.Fred directly acquires EXACTLY the same data she used for his own project
60
60 Communicating Scientific Thought and Data Oxford March, 2006 Summary A bit of pot-pourri: Data Reuse depends on metadata, and eventual reuse depends on the originator doing it right! –Use CF, get involved in NumSim etc … –NDG will hopefully make it easier to exploit data! Timeliness of information is important and may become more relevant than quality (alone)! Boundary between “papers” and “data” is blurring! –The next but one RAE (if it happens) may reflect this! Automated linking of resources will proliferate –Use your IR and your data centre (BADC!)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.