Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD.

Slides:



Advertisements
Similar presentations
BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé
Advertisements

Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.
Rolling Deck to Repository: Transforming the United States Academic Fleet Into an Integrated Global Observing System Suzanne M. Carbotte, Robert Arko,
The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.
Tutorial 3 Refractor assignment, Analysis, Modeling and Statics
Hernan E. Garcia (U.S. NODC, IODE Group of Experts on Biological and Chemical Data Management and Exchange Practices) EDM Workshop 2014, Silver Spring,
Detecting and Tracking of Mesoscale Oceanic Features in the Miami Isopycnic Circulation Ocean Model. Ramprasad Balasubramanian, Amit Tandon*, Bin John,
Pertemuan 7-8 Matakuliah: A0214/Audit Sistem Informasi Tahun: 2007.
AUDITING INFORMATION TECHNOLOGY USING COMPUTER ASSISTED AUDIT TOOLS AND TECHNIQUES.
2 nd international Conference for GODAR-WESTPAC JODC, Japan Coast Guard, Tokyo 2004.
Factorization Factorization is writing a number as a product of factors
Regression Basics For Business Analysis If you've ever wondered how two or more things relate to each other, or if you've ever had your boss ask you to.
First Data Management Training Workshop, February, 2007, Oostende, Belgium 1 Quality control checks description First Data Management Training Workshop.
Quality Control Standards for SeaDataNet Review status at 1 st Annual Meeting (March 2007) Review developments over last year Current status Future work.
IQuOD An International Quality Control Effort Tim Boyer EDM workshop September 10, 2014.
Reiner Schlitzer Alfred Wegener Institute for Polar and Marine Research Ocean Data View - Available Data Collections and Data Model.
Factor vs Multiple 24 FactorsMultiples. Prime Factorization Using factor tree or division method24.
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Submitting data to (and getting data from!) BODC Adam Leadbetter British Oceanographic Data Centre Joseph Proudman Building 6 Brownlow Street Liverpool.
Students collect a water sample. An amphipod that couldn’t escape our nets. Figure 1: This screenshot shows the controlling page for running model animations.
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
AERONET Web Data Access and Relational Database David Giles Science Systems and Applications, Inc. NASA Goddard Space Flight Center.
Defining Digital Forensic Examination & Analysis Tools Brian Carrier.
SOCAT Surface Ocean CO 2 ATlas Are Olsen 1, Benjamin Pfeil 1, Dorothee Bakker 2, Maria Hood 3, Nicolas Metzl 4, Christopher Sabine 5, Alex Kozyr 6
Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow
The Tools of Geography FrancisciWG.1. Remember: Geography is the science that studies the lands, the features, the inhabitants and the phenomena of the.
U.S. GLOBEC Pan-Regional Synthesis Workshop 1 Presentation to the U.S. GLOBEC Pan-Regional Workshop 29 November 2006 Bob Groman Data Access and Associated.
© Crown copyright Met Office The EN QC system for temperature and salinity profiles Simon Good.
STATE GOAL 11: Understand the processes of scientific inquiry and technological design to investigate questions, conduct experiments, and solve problems.
8 th Grade.  Scientific Method Review   Ask a question to find out more information or to solve a problem.  What does this fossil show?  What kind.
From Ocean Sciences at the Bedford Institute of Oceanography Temperature – Salinity for the Northwest.
VERTIGO data OCB database status update Cyndy Chandler Ocean Carbon and Biogeochemistry Data Management Office Cyndy Chandler Ocean Carbon and Biogeochemistry.
September 15-17, 2010, Istanbul, Turkey 20 th Meeting of the Advisory Group on Pollution Monitoring and Assessment and Bathing Water Experts Workshop September.
MAY ICES – MDM 2004, BRUSSELS Processing and Quality Checks of Shipboard ADCP Michèle FICHAUT, IFREMER/SISMER.
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
Quality Control for the World Ocean Database GSOP Quality Control Workshop June 12, 2013.
Data Management during GEOTRACES Data Management sub-committee: Reiner Schlitzer, Jing Zhang, Bill Jenkins, Chris Measures Scientific.
November 16, 2009 Page 1 of 28 Data and Data Management: Introduction to the BCO-DMO Presented to Professor Keiichi Uchida November 16, 2009 Robert C.
U.S. GLOBEC Georges Bank 2007 Phase 4B SI Meeting April 23, 2007 GoMODP, Data Interoperability and the MapServer Interface to U.S. GLOBEC Data Presented.
Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.
1 NODC Quality Control : Automatic Checks - reveal systematic errors in incoming data and metadata - eliminate most non-representative data from consideration.
PHYSICAL OCEANOGRAPHY Part 4: Sound in the Ocean
Branches of Earth Science - The study of the origin, history, process & structure of the solid earth. - The study of the dynamics and changing conditions.
1 1 NOAA Office of Ocean Exploration End-to-End Data Management: A Success Story NOAA Tech Conference November 2005 Susan Gottfried National Coastal Data.
Programming in R Subset, Sort, and format data. In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data.
Support to GLOSS Delayed-Mode Data Management: Joint Archive for Sea Level Mr. Patrick Caldwell JASL Data Manager GLOSS GE XII, Nov. 9, 2011.
Future needs and plans for ocean observing in the Arctic AOOS Arctic Town Hall Futur Zdenka Willis Integrated Ocean Observing System National Program Office.
NOAA National Climatic Data Center Dr. Karsten Shein Climatologist NOAA/NESDIS/NCDC 151 Patton Ave. Asheville, NC
Biological and Chemical Oceanography Data Management Office slide 1 of 10 U.S. GEOTRACES Data Management Cyndy Chandler BCO-DMO ~ WHOI 23 September 2008.
MetricsVis: Interactive Visual System of Customized Metrics on Evaluating Multi-Attribute Dataset Nikhil Ghanta, Jieqiong Zhao, Calvin Yau, Hanye Xu, Brian.
Evaluating Climate Visualization: An Information Visualization Approach -By Mridul Sen 1.
CTD Data Processing Current BIO Procedure. Current Processing Software Matlab Migrating to R & Python Code Version Control SVN Migrating to GitHub.
WP3 - Quality Control survey findings and gaps M. Vinci, A. Giorgetti.
The SeaDataNet data products regional temperature and salinity historical data collections S. Simoncelli 1, C. Coatanoan 2, O. Bäck 3, H. Sagen 4, S.
New HR-DDS User Services David J. S. Poulter, National Oceanography Centre, UK
Introduction to BODC and GEOTRACES data office Edward Mawji British Oceanographic Data Centre
Reiner Schlitzer Alfred Wegener Institute for Polar and Marine Research Ocean Data View and itsRole in SeaDatanet and its Role in SeaDatanet.
Metadata requirements for archiving structured data Alice Born Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (9-11 April.
My perception of the fundamental problem Data are not sexy PI easily signs up to need to deliver But the published paper takes priority and then the next.
SeaDataNet Technical Task Group meeting JRA1 Standards Development Task 1.2 Common Data Management Protocol (for dissemination to all NODCs and JRA3) Data.
Data Browsing/Mining/Metadata
Linked Data for Field Deployments
Outline RTQC goals to achieve Description of current proposal Tests
Comparisons of Argo profiles and nearby high resolution CTD stations
Data and Data Management: Introduction to the BCO-DMO
Hovedside.
THE NATURE OF SCIENCE Visual Vocabulary.
GODAE Quality Control Pilot Project
Topics discussed in this section:
Visualization of Global Argo Metadata:
Presentation transcript:

Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

BETTER DATA is … Easily Accessible Well Documented Integrated / Interlinked The Best Quality possible

Oops! (When Data Management Fails)

BETTER DATA is … Easily Accessible Well Documented Integrated / Interlinked The Best Quality possible

BETTER DATA is … Easily Accessible Well Documented Integrated / Interlinked The Best Quality possible

WHY QC? To find errors in the data …

WHY QC? To find errors in the data … –To detect instrument failure or sampling problems

WHY QC? To find errors in the data … –To detect instrument failure or sampling problems –To detect phenomena of scientific interest Natural physical or biological events Something new

WHY QC? To find errors in the data … that were not present in the original data ?!

WHY QC? To find errors in the data … that were not present in the original data ?! –Data Pathway errors human error computer error

WHAT TO QC? Individual values (the measurements)? Profile of multiple values? Cruise of multiple profiles? Project of multiple cruises? Region or Ocean of multiple Projects? Entire World of multiple Regions?

What software, tools, and skills are available?

Lets get started …

QC OF THE WHAT & HOW

Need to first understand the methods, variables, and units of the data before trying to QC the data

QC OF THE WHAT & HOW Need to first understand the methods, variables, and units of the data before trying to QC the data –Are all labels clear and unambiguous –Are methods provided (or a reference) –What are the value units

QC OF THE WHEN & WHERE

Primary Data: –First, check the master ship record –Then check PI files

QC OF THE WHEN & WHERE Primary Data: –First, check the master ship record –Then check PI files Simple Range Checks –Time (0-23? 1-24?) What is the time zone? –Lat +/- 90 Lon +/- 180 Are hemisphere signs present (E/W) or described

QC OF THE WHEN & WHERE Map the Cruise Track –sorted by station sequence –sorted by sampling time

QC OF THE WHEN & WHERE Calculate ship speed (distance/time) between stations

QC OF THE HOW MUCH

First, look at the background environment Check for depth inversions Check for density inversions Look at T vs. S plot

QC OF THE HOW MUCH Look at the variable vs. depth

QC OF THE HOW MUCH Check against basic value ranges

QC OF THE HOW MUCH Check against basic value ranges Check for excessive gradients (spikes) between values at adjacent depths

QC OF THE HOW MUCH

Expert / Specialist Data Centers

Can provide guidance on –Metadata (standards, minimum requirements) –Data Formats (format suggestions / review) –Tools and Methods

Expert / Specialist Data Centers Can provide guidance on –Metadata (standards, minimum requirements) –Data Formats (format suggestions / review) –Tools and Methods May have advanced visualization or QC methods available for your data.

Empirical Comparisons with Historical Observations (ECHO)

Expert / Specialist Data Centers (just a few examples) CCHDO- CLIVAR Carbon & Hydrographic Data Office BCO-DMO- Biological and Chemical Oceanography Data Management Office BODC- British Oceanographic Data Centre COPEPOD- Coastal & Oceanic Plankton Ecology, Production & Observation Database

The Conclusions

Some Conclusions Each additional layer of QC and examination may highlight issues that were previously undetected.

Some Conclusions Each additional layer of QC and examination may highlight issues that were previously undetected. Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss).

Some Conclusions Each additional layer of QC and examination may highlight issues that were previously undetected. Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss). The comprehensiveness of the co-stored metadata will determine the extent to which the data are still usable/understandable 10+ years after the project.

BETTER DATA is … Easily Accessible Well Documented Integrated / Interlinked The Best Quality possible