The Analytic Potential of Long-Tail Data: Sharable Data and Re-use Value Carole L. Palmer Center for Informatics Research in Science & Scholarship Graduate.

Slides:



Advertisements
Similar presentations
Evolutionary biology Population genetics Systematics Paleontology Botany and Zoology Genomics Ecology Medicine Agriculture Anthropology Bioinformatics.
Advertisements

Lorcan Dempsey OCLC Big Heads – Heads of Technical Services of Large Research Libraries ALA 2013 Chicago 28 June things about
Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
The RUresearch Data Portal: Providing Customized Access for Specific Types of Data and Primary Users Mary Beth Weber Head, Central Technical Services Rutgers.
Data Sharing Practices: Implications for Curation and Re-use Carole L. Palmer Center for Informatics Research in Science & Scholarship Graduate School.
Preserving the Scientific Record: Case Study 1 – National Snow & Ice Data Center (NSIDC) Glacier Photos Matthew Mayernik National Center for Atmospheric.
Assessing Graduate Student Research Behaviors: Humanities, Social Sciences, and Sciences ARL Fall Forum Cecily Marcus & Karen Williams October 12, 2007.
Data Sharing, Small Science, and Institutional Repositories Melissa H. Cragin & Carole L. Palmer Center For Informatics Research in Science and Scholarship.
Data Sharing Practices: Implications for Curation and Re-use Carole L. Palmer & Tiffany Chao Center for Informatics Research in Science & Scholarship Graduate.
University of Southampton, U.K.
The Data Curation Profile IASSIST 2010 Jake Carlson Data Research Scientist Purdue University Libraries.
Biological Oceanography Scientific Domain Ed DeLong MIT Department of Biological Engineering Department of Civil and Environmental Engineering DataSpace.
© 2004 University of Rochester LibrariesSlide 1 Enhancing DSpace Based on a Work-Practice Study DSpace Federation User Group Meeting March 10, 2004 Dave.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
Creating documentation and metadata: Introduction to metadata and metadata standards Lynn Yarmey National Snow and Ice Data Center Version 2.0 Review Date.
Section 1: What Is Earth Science?
Drivers for a PRAGMA Biodiversity Science Expedition Reed Beaman Florida Museum of Natural History University of Florida.
NationalDataService.org Cross-disciplinary Reuse Carole L. Palmer Director and Professor Center for Informatics Research in Science & Scholarship Graduate.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data.
NASA AIST Sensor Web PI Meeting II Wrap up Topics Karen Moe Advanced Information Systems Technology Program NASA, Earth Science and Technology Office.
Sun PASIG Fall 2008 Meeting 26 October 2008 Carole L. Palmer Center for Informatics Research in Science & Scholarship Graduate School of Library and Information.
Data Curation Education and Biological Information Specialists DigCCurr 2007 Chapel Hill, April 20, 2007 P. Bryan Heidorn, Carole L. Palmer, Melissa H.
Imagine a World…. With easy, unlimited access to scientific data from any field Where you can easily plot data of interest and display it any way you want.
Information and Discovery in Neuroscience (IDN) Carole Palmer Graduate School of Library and Information Science University of Illinois at Urbana-Champaign.
Preserving the Scientific Record: Case Study 1 – National Snow & Ice Data Center (NSIDC) Glacier Photos Matthew Mayernik National Center for Atmospheric.
Data Curation Education JCDL Pittsburgh, June 20, 2008 Linda C. Smith Melissa H. Cragin, Carole L. Palmer, W. John MacMullen, P. Bryan Heidorn.
Michael Witt Interdisciplinary Research Librarian & Assistant Professor Purdue Libraries & Distributed Data Curation Center (D2C2) Eliciting.
Data Practices across Disciplines: Informing Collections & Curation Carole L. Palmer Melissa H. Cragin, Tiffany Chao, & Nic Weber Center for Informatics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Creating Documentation and Metadata: Introduction to Metadata and Metadata Standards Lynn Yarmey National Snow and Ice Data Center Version 1.0 February.
Biological and Chemical Oceanography Data Management Office slide 1 of 19 CAMEO Data Management Bob Groman Biological and Chemical Oceanography Data Management.
DLESE Data Services 2005 Workshop. Overview Outcomes of 2004 DLESE Data Services Workshop Plans and Structure of the 2005 DLESE Data Services Workshop.
Discovery and Access of Historic Literature from the IPYs (DAHLI) Rescuing Records and Publications of Early IPY Ventures Ruth Duerr and Allaina Howard.
Branches of Earth Science And if you are looking for remotely sensed images of the Earth, this view is the most remotely sensed image we have.
Earth Science Chapter 1.
The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert
The Phylogeny of a Dataset Andrea K Thomer & Nicholas M. Weber Center for Informatics Research in Science and Scholarship Graduate School of Library and.
Site-Based Data Curation at Yellowstone National Park PI: Carole L. Palmer, GSLIS, CIRSS Co-PIs: Bruce Fouke, Geology, Microbiology, Institute for Genomic.
NSF Arctic Sciences Section and TREC Background, Goals, Hints for Success Renée Crain Assistant Program Officer.
Freedom to think: The Science of Data Dr Quentin Williams.
A Proposed Short Course on Data Stewardship Scott Hausman Deputy Director NOAA’s National Climatic Data Center Preparing Scientists to Steward Their Data.
Big Data in the Geosciences, University Corporation for Atmospheric Research (UCAR/NCAR), and the NCAR Wyoming Supercomputing Center (NWSC) Marla Meehl.
Research Data Access and Preservation Summit 2012 E-Research Roundtable Center for Informatics Research in Science and Scholarship Graduate School of Library.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Introduction to Earth Science Section 1 Name the four main branches of Earth science. Discuss how Earth scientists help us understand the world around.
Introduction to Earth Science Section 1 SECTION 1: WHAT IS EARTH SCIENCE? Preview  Key Ideas Key Ideas  The Scientific Study of Earth The Scientific.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Section: The Case for Data Stewardship.
1 [1] Engineering Geology (EC 101) [1] Dr SaMeH Saadeldin Ahmed Associate Prof. of Environmental Engineering Civil and Environmental Engineering Department.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Michael Witt, Jacob Carlson, D. Scott Brandt Purdue University Melissa H. Cragin University of Illinois at Urbana-Champaign Constructing Data Curation.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
BECOMING ‘HYPERDISCIPLINARY’: Implications of increasingly hybrid collaborations Dr Peter Darch Department of Information Studies University of California,
A. D. SMITH – SEPTEMBER 28, 2011 DATA CURATION PROFILE.
Digital Asset Management: E-Science Life-Cycle Anthony D. Smith Ocean Teacher Academy Training Course, 30 September - 4 October 2013, Mombasa, Kenya.
Physical and Human Geography
Branches of Earth Science. A geologist is a scientist who studies the solid and liquid matter that constitutes the Earth as well as the processes and.
Section 1: What Is Earth Science?
BURIED DEEP: How data about subseafloor life becomes dark and why
Research on Data Curation and Repositories
Designing an Infrastructure for Heterogeneity of Ecosystem
Dtk-tools Benoit Raybaud, Research Software Manager.
Bird of Feather Session
Presentation transcript:

The Analytic Potential of Long-Tail Data: Sharable Data and Re-use Value Carole L. Palmer Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign Wolfram Data Summit 6 September 2012

Collaborator: –Melissa Cragin Doctoral students: –Nic Weber –Tiffany Chao –Karen Baker –Andrea Thomer Qualitative studies of data production and use long tail - complex, heterogeneous data re-use value across disciplines implications for curation of research data Illinois – Data Practices team Preserve * Share * Discover PI – Sayeed Choudhury Promoting data preservation and re-use across disciplines.

Range$300,000 - $38,131,952$579 - $300,000 20%80% Number of Grants Total dollars$1,747,957,451$1,117,431,154 (Heidorn, 2009) 12,025 NSF grants awarded in 2007 = $2,865,388,605 The “big tail” (Heidorn, 2009)

Earth & life sciences Oceanography Climate science - modern Climate science - paleo Soil ecology Volcanology Stratigraphy Mineralogy Microbiology Sensor network science Environmental engineering Photonics earth and life science intersection - systems geobiology as exemplar Curation Profiles Project Anthropology Plant sciences Kinesiology Speech and Hearing Earth and Atmospheric

Methods Researchers managing data - stages, versions, standards, tools 4) Data deposit & sharing worksheet 5) Data samples, related documentation Talking shop about data - efficient exchange with right researchers about right dimensions Lead scientists - research context, sharing, access, discovery, re-use 1) Pre-interview worksheets 2) Semi-structured interviews 3) Follow-up sessions with selected participants

Interpreting perspectives and practices as raw materials of research for application in other fields in aggregation or integration with other data Forms most easily or willingly shared may not have most re-use value. “My data will never be of use to anyone else.” “Of course I'm willing to share my data publicly.” “There are no standards in my field.”

Field Research Area Form to be sharedFormatsTypeSize Shared when? Agronomy water quality, drainage, and plant growth cleaned, reviewed sensor; hand- collected samples.xls approx. 100 files ~1MB each, up to 20 Mb After publication Geology rock, water and microbes averaged sensor; hand- collected samples; photographs.xls; jpg 1 file; images< 1 Mb After publication Civil Eng traffic movement cleaned, normalized sensor data MySQL (postgre SQL) 1 data- base approx K/day 1 month to 1 year embargo Sharing variations across fields

Analytic potential Value beyond original intended use Long-term utility user communities fit for purpose for new applications preservation ready High AP – applicable and functional to multiple communities / high priority problems representation information, context, metadata, fixity, etc. that someone -- or some machine -- other than the original data producer can use and interpret the data.

Utility for producers – compound units GeobiologyVolcanologySoil ecologySensor science Data unit Site-specific time series: - spreadsheets averaged rock, water chemistry measures -microscopy images -annotated field photos -microbial genomic data Rock profile: physical rock thin section chemical analysis photographs field notes Database: multiple abiotic soil measures associated metadata Database: soil data sensor data Sharing conventions by request no repository by request no repository public resource collection Reference data Limits – customization “vertical” dev.

Utility for reuse – components of compound units …somebody more knowledgeable about isotopes can take the data that I produced and do a whole different series of investigations. … there are people who might work on little iron and titanium oxides which I don’t really care about. …there’s a lot of geochemical work that’s done that relies less on field context.

Curation of functional units Scholarly record of data collected / analyzed Preservation of research assets Raw materials for research Searching, browsing, chaining, filtering, retrieving… __________ Optimal organizational groupings especially beyond data associated with papers. – collections, sites, producers

User communities GeobiologyVolcanology Time seriesRock profile Designated community Microbiology Geobiology Geology Igneous petrology Geophysics Geochemistry Potential communities Chemistry, Evolutionary biology Bioprospecting U.S. Park Service Public Health Glaciology Reuse applications (parts of unit) Microbial data - assess presence and extent of disease Field photos – assess spacio-temporal glacier change over time

“A classic example is the NSIDC glacier photo collection, which 10 years ago no one had heard of, and no one thought was worth digitization. It is now NSIDC's 2nd most popular data set.” (Ruth Duerr, National Snow & Ice Data Center) “The value of data increases with their use.” (Uhlir, 2010) Value and use – ecosystem or data economy How do we predict what data will become highly valuable? How do data gain in value through use? What data do we invest in?

End of data collection Public releas e Time Popularity Old enough for comparison studies Useful for long-term trends General Popularity Curve for Earth Science Data (Ruth Duerr, NSIDC, personal communication, 4 September 2012)

Reputation of data collector Spatial coverage Longitudinal coverage Site factors: unique conditions, rarely studied, politically volatile, permitting requirements Multiple sources for triangluation and context Documentation of workflows and provenance Value indicators Climate / Ocean modeling Soil Ecology Volcanology Stratigraphy Sensor and Network Engineering

Value gains with shared data Ocean modelers with field campaign data Gathering complementary evidence richness & verification (Weather during plane flight pattern, satellite serial numbers, irregularities in open sea mooring) Sensor engineers reworking water measurements to share Transforming for multiple audiences refinement & fit (For search and rescue, triathlon organizer, fishermen, industry ship maneuvering) Rainforest researchers with sensor block temperatures Recalibration and feedback accuracy ( improved instrument level calibrations and original climate science group’s measurements)

Recruit data with multiple value indicators Preservation imperative for long re-use cycles Promote sharing for value gains Support capture and representation of work with shared data Implications

Used with permission from B. Fouke Value indicators: special permitting, site uniqueness, longitudinal coverage, politically volatile (bioprospecting), multiple sources for triangulation Collaboration with - Bruce Fouke, U of I, Geology, Microbiology, Genomic Biology - Ann Rodman, National Park Service Future work: Site-based curation for Geobiology Yellowstone National Park - mecca for data collection Key to research questions ranging from origin of life on Earth to the search for life on other planets.

Thank you Center for Informatics Research in Science and Scholarship -- Dataconservancy.org