Staying afloat in the sensor data deluge John H. Porter, Paul C. Hanson, Chau-Chin Lin Trends in Ecology & Evolution Volume 27, Issue 2, Pages 121-129 (February 2012) DOI: 10.1016/j.tree.2011.11.009 Copyright © 2011 Elsevier Ltd Terms and Conditions
Figure 1 A generic view of sensor data processing. Sensor data typically pass through a series of steps or levels, although what is required to reach a given level can vary across projects. A special challenge for sensor data is finding ways to uniquely identify versions of the data for citation, when, in principle, each new line of data can define a new version of the data. Most publically-available sensor data is Level 1 data and not all projects create Level 2 data. Trends in Ecology & Evolution 2012 27, 121-129DOI: (10.1016/j.tree.2011.11.009) Copyright © 2011 Elsevier Ltd Terms and Conditions
Figure 2 Scientific workflow tools provide graphical interfaces for capturing and executing complex data manipulations and analyses. This Kepler workflow uses EML metadata to produce and run an R statistical language program that produces quality assurance reports and graphs. It incorporates a variety of tools including an eXtensible Markup Language (XML) stylesheet processor, a text editor, R statistical programs and text and graphical display tools. These are a small subset of the capabilities built into Kepler, which also include remote processing, database, mathematical and data conversion tools. Encapsulation of such diverse capabilities within a single graphical environment reduces the need for external documentation and facilitates sharing. Such workflows can be easily transferred between users, used to replicate analyses or further customized to add new capabilities or analyses. Trends in Ecology & Evolution 2012 27, 121-129DOI: (10.1016/j.tree.2011.11.009) Copyright © 2011 Elsevier Ltd Terms and Conditions
Figure I Sensor nodes and sensor networks come in different sizes and shapes. Sensor nodes include common components (a) but can vary in size from a ‘mote,’ which incorporates light sensors, processor and radio into a compact battery-powered unit (b), to a large installation such as a carbon flux tower (c), which incorporates temperature, wind, water level and CO2 sensors, data loggers and computers. Sensor nodes can be interconnected using star (d), mesh (e) and hierarchical (f) topologies. Sensor nodes are shown as circles and network links as dashed lines. Sensor nodes shown with a solid fill are used to transfer data out of the sensor network to researchers. Hierarchical topologies are frequently used for sensor networks where there are multiple study locations, each with its own sensor network. More powerful radios are used for the inter-site links whereas low powered radios can be used within a site. Trends in Ecology & Evolution 2012 27, 121-129DOI: (10.1016/j.tree.2011.11.009) Copyright © 2011 Elsevier Ltd Terms and Conditions
Figure I The ‘Cyberinfrastructure (CI) Ecosystem’ associated with GLEON. Sensor data from lake observatories stream to the data repository named Vega. Traditionally sampled data are collected in LakeBase. Data from both repositories can be exported to formats for use in common analysis software. Condor provides distributed computing to support complex analyses with long run times. Although data continually stream from sensing platforms to Vega, human intervention is required when platform changes are made. Because analysis models often are innovated to suite the science questions, data export to data analysis is manual, except for simple visualization, export to Web pages, common transformations, and synchronization of multiple variables. Trends in Ecology & Evolution 2012 27, 121-129DOI: (10.1016/j.tree.2011.11.009) Copyright © 2011 Elsevier Ltd Terms and Conditions