LTER Information Management Training Materials LTER Information Managers Committee Introduction to LTER Information Management John Porter.

Slides:



Advertisements
Similar presentations
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Advertisements

1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
Using Specimen Data in Scientific Workflow Environments to Connect to Metadata Archive and Discovery Services in Environmental Biology CJ Grady, J.H. Beach,
The Changing Research Data Paradigm One agency’s response Changes to Implementation of NSF’s Data Sharing Policy NOAA’s second annual Environmental Data.
Decision Support Systems and Global Spatial Data Infrastructure Working Group Decision Support Systems and the Global Spatial Data Infrastructure Working.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
DEVA Data Management Workshop Devil’s Hole Pupfish Project Data Management Workshop Devil’s Hole Pupfish Program Death Valley National Park Introduction.
Building the LTER Network Information System. NIS History, Then and Now YearMilestone 1993 – 1996NIS vision formed by Information Managers (IMs) and LTER.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Company LOGO Broader Impacts Sherita Moses-Whitlow 07/09/09.
Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.
Integrating Digital Curation in a Digital Library curriculum: the International Master DILL case study Anna Maria Tammaro University of Parma Florence,
The Natural Resources Digital Library Needs, Partners, and Challenges Bonnie Avery, Janine Salwasser, & Janet Webster Oregon State University.
A different story Melendez,  The role of Information Management in the evolution of Informatics: two perspectives  About Informatics and Information.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
Data Management Plans Bill Michener University Libraries and Biology Dept. University of New Mexico.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
Students Becoming Scientists in the World: Integrating Research and Education for Sustainable Development Dr. James P. Collins Directorate for the Biological.
Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
SCIENCE, RESEARCH DATA, AND PUBLISHING Stewart Wills Editorial Director, Web & New Media, Science 26 February 2013.
Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
An Introduction to Metadata Tammy Walker Beaty Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Data Management.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Data Management: Documentation & Metadata Sherry Lake, Senior Data Consultant Bill Corey, Data Consultant Jeremy Bartczak, Intellectual Access & Metadata.
Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
The Long Term Ecological Research Network LTER. LTER Network Vision, Mission and Goals Network Vision: A society in which exemplary science contributes.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Introducing Australia’s Terrestrial Ecosystem Research Network: linking disciplines for better environmental outcomes. Nikki Thurgate.
Jake F. Weltzin United States Geological Survey Taking the Pulse of our Planet The USA National Phenology Network.
DISCIPLINARY PERSPECTIVE BIOLOGY/ECOLOGY Workshop on Cyberinfrastructure for Environmental Research and Education November 1, 2002.
South Africa in the global knowledge arena: implications for academic libraries Andrew M. KANIKI Executive Director: Knowledge Management and Strategy.
Science Content Standards, Benchmarks, and Performance Standards 5 th -8 th Grade Strand I:Scientific Thinking and Practice Standard I:Understand the processes.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Soil and Water Conservation Modeling: MODELING SUMMIT SUMMARY COMMENTS Dennis Ojima Natural Resource Ecology Laboratory COLORADO STATE UNIVERSITY 31 MARCH.
Building the LTER Network Information System. NIS History, Then and Now YearMilestone 1993 – 1996NIS vision formed by Information Managers (IMs) and LTER.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
The AIACC Project Assessments of Impacts and Adaptations to Climate Change Neil Leary, AIACC Science Director AIACC Regional Workshop for Latin America.
NATIONAL TREASURES DATA PRESERVATION WITH METADATA Sharon Shin Metadata Coordinator Federal Geographic Data Committee Secretariat ASPRS-Reno 2006.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara Advancing Software for Ecological.
Metadata ESA Workshop. In this session we will discuss…  Metadata: what are they? and why should they be created?  Metadata standards  Creating metadata.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Where/how could we change the overall process of field project implementation to improve in our mission of answering key science questions? Are we open.
The study was requested of the NRC’s Board on Life Sciences by NSF, NIH, and DOE To examine the current state of biological research in the U.S. and recommend.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian,
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
Strategies for NIS Development
Data Management: Documentation & Metadata
Staying afloat in the sensor data deluge
Bird of Feather Session
Presentation transcript:

LTER Information Management Training Materials LTER Information Managers Committee Introduction to LTER Information Management John Porter

“If you want to understand life, don’t think about vibrant throbbing gels and oozes, think about information technology” Richard Dawkins (1986, “The Blind Watchmaker”)

Science in a number of disciplines are recognizing that our ability to manage and assimilate massive quantities of data are a key to understanding of our world.

Scientific Use of Data  The traditional model of using data

Scientific Use of Data  A new model incorporates sharing and archiving Michiner et. al. 2011, Ecological Informatics

Scientific Use of Data Archiving and sharing data provides new opportunities for better understanding our environment

LTER Network Vision, Mission and Goals The LTER Executive and Coordinating Committee have developed a set of Network Goals, and is creating a prioritized set of Objectives, Tasks and Metrics under each of those Goals. Understanding: To understand a diverse array of ecosystems at multiple spatial and temporal scales. Synthesis: To create general knowledge through long-term, interdisciplinary research, synthesis of information, and development of theory. Information: To inform the LTER and broader scientific community by creating well-designed and well -documented databases. Legacies: To create a legacy of well-designed and documented long-term observations, experiments,and archives of samples and specimens for future generations. Education: To promote training, teaching, and learning about long-term ecological research and the Earth’s ecosystems, and to educate a new generation of scientists. Outreach: To reach out to the broader scientific community, natural resource managers, policymakers,and the general public by providing decision support, information, recommendations and the knowledge and capability to address complex environmental challenges. Network Vision: A society in which exemplary science contributes to the advancement of the health, productivity, and welfare of the global environment that, in turn, advances the health, prosperity, welfare, and security of our nation. Network Mission: To provide the scientific community, policy makers, and society with the knowledge and predictive understanding necessary to conserve, protect, and manage the nation's ecosystems, their biodiversity, and the services they provide. Network Vision: A society in which exemplary science contributes to the advancement of the health, productivity, and welfare of the global environment that, in turn, advances the health, prosperity, welfare, and security of our nation. Network Mission: To provide the scientific community, policy makers, and society with the knowledge and predictive understanding necessary to conserve, protect, and manage the nation's ecosystems, their biodiversity, and the services they provide.

LTER Information Management  Enabling NEW SCIENCE  Beyond the single investigator  Global and Regional Studies  Long-Term Studies  Resources for LTER Science  Resources for the larger scientific community  Posterity – leaving behind a legacy of resources for future researchers

Data Value Time Serendipitous Discovery Inter-site Synthesis Gradual Increase In Data Equity Methodological Flaws, Instrumentation Obsolescence Non-scientific Monitoring Increasing value of data over time Slide from James Brunt

Long-Term Data  The Invisible Present John Magnuson ersonnel/magnuson/articles /magnuson_biosci_v pdf A single data point from the spring of 1980 Charles D. Keeling established a station of continuous CO2 monitoring on Mona Loa in 1958

The Invisible Present

Challenges for LTER Information Management Keeping information organized is a fight against Entropy – the tendency for systems to become disorganized (2 nd law of thermodynamics)  Technological Challenges  Semantic Challenges  Cultural Challenges

Challenge: How do you deal with technological change? Text – ASCII, EBCDIC & Unicode Lotus 1-2-3VisiCalc Word Perfect Wordstar DBase III Quatro- Pro WordMacOS ExcelWindows AccessDOS XMLLinux

LTER Solutions  When possible employ widely-used, generic forms for archival storage of data  Data tables in comma-separated-value files using ASCII or UNICODE text  Periodically convert older proprietary formats that can’t be stored in a generic form (e.g. GIS data)  Periodically migrate physical media (cards  tape  DVD)  Forge relationships with other organizations (e.g. DataONE)  Add “energy” to the system: Invest in information managers and information management systems that continuously manage data

Challenge: Understanding Data Without Metadata, the usable information content of data declines over time Michener et al Ecological Applications Information Content Time Time of publication Specific details General details Accident Retirement or career change Death

LTER Solutions  Standardized Metadata – Ecological Metadata Language (EML)  Site and Network Tools for creation of EML  Network-Wide Data Catalog  PASTA system for Provenance – Aware metadata for derived data products

Web forms allow us to create standard “Ecological Metadata Language” (EML) data using a metadatabase

“Cultural” Challenges  Unfamiliarity with Sharing Data  Incentives for sharing data  Lack of expertise in:  Advanced tools for managing and integrating data  Quality Control and Assurance  creating archival- grade datasets

Data Sharing and Archiving

LTER Solutions – Data Sharing  The LTER Network Data Policy dictates that almost all data should be made available within 2-years  exceptions must be justified  NSF and Renewal Panels pay close attention to whether sites are adhering to the policy.  Data Availability  Funding!

Additional Incentives  NSF now requires Data Management Plans for non-LTER data as well  A better plan increases your chance of funding  Journals are increasingly requiring data submission as a condition of publication for papers (e.g,., evolution, genomics journals)  Increasingly data is citable  Allows you to tally the citations of your data as well as citations of your publications  Data can even be published: e.g., Ecological Archives publishes “data papers” that are peer-reviewed

Challenge  The ways researchers typically use data are frequently not compatible with best practices for archiving

LTER Solutions  Site IM’s help vet or prepare data  Help communicate best practices to students and investigators  Use of improved tools that encourage good practices Don’t Ever Sort this!!!!!! Complete lines are OK to Sort

Useful Tools  Databases (e.g., mySQL, ACCESS, SQLite, PostgreSQL)  Geographical Information Systems (GIS)  Statistical Packages (e.g., R, SAS, SPSS, Matlab)  Metadata Editors (e.g., Morpho)  Programming Languages (e.g., Python, C++, Java, FORTRAN)  Scientific Workflow Systems (e.g., Kepler, VisTrails, Taverna)

The DataONE Data Life Cycle PlanCollectAssure Describe PreserveDiscover Integrate Analyze

The DataONE Data Life Cycle PlanCollectAssure Describe PreserveDiscover Integrate Analyze Design of forms, databases or other data structures, Capture of digital information

The DataONE Data Life Cycle PlanCollectAssure Describe PreserveDiscover Integrate Analyze Quality Control Quality Assurance Avoid “Garbage In, Garbage Out” In the “traditional” model, we would jump to Analyze here…

The DataONE Data Life Cycle PlanCollectAssure Describe PreserveDiscover Integrate Analyze Production of Metadata Who, what, when, where why and how Form of data Submission to an Archive

The DataONE Data Life Cycle PlanCollectAssure Describe PreserveDiscover Integrate Analyze Reuse of data to produce new scientific insights

Data Reuse  For data reuse, the greatest opportunities will be presented by exceptional data  High quality  Useful transformations  Excellent metadata  Integration with other data  Similar data from other places or times  Different kind of data that add additional value when interpreting data  Gap-filled, extensive QA/QC

Archiving and Publishing Data Porter, Hanson and Lin, TREE 2012

Next Steps  Learn one or more advanced tools for manipulating data  Databases  GIS  Statistical software  Computer languages  Collect some data and conduct a quality assurance analysis on it  Prepare Metadata and submit data to an archive  Search data archives for related data that can be integrated with your data to reach a wider array of conclusions

Questions???? “Applied computer science is now playing the role which mathematics did from the seventeenth century through the twentieth century; providing an orderly, formal framework and exploratory apparatus for other sciences.” - George Djorgovski Professor of Astronomy, Caltech ( )