Download presentation
Presentation is loading. Please wait.
Published byErnest McKinney Modified over 9 years ago
1
CDI: DATA MANAGEMENT WORKING GROUP Heather Henkel and Viv Hutchison
2
Outline Data Management in USGS : Why? Evolution of the Data Management Working Group Monthly Presentations Sub-Team Accomplishments Powell Center Proposal and Alternative Results FY12 Proposals
3
Data Management in USGS : Why? About the Data Rescue Program: “It is both a great and terrible thing that we have such a program at the USGS” (J. Faundeen, August 16, 2011) CDI Recognizes: Good data management is a prerequisite for data integration.
4
Data Management in USGS : Why? Credit: DataONE
5
Data Management Working Group Goals The Data Management Working Group will: seek mechanisms for incorporating data management into USGS science develop ways to educate its scientists of its value The group seeks to elevate the practice of data management such that it is seen as a critical partner in the pursuit of science in USGS
6
What is Data Management? “The business function that develops and executes plans, policies, practices and projects that acquire, control, protect, deliver and enhance the value of data and information.” Source: DAMA Dictionary of Data Management, 1st Ed.
7
Evolution of the Data Management Working Group 2010 CDI Meeting: formation of group Monthly telecons with ~50 Working Group participants from across the USGS Mission Areas and partner agencies:
8
Data Management Working Group Wiki my.usgs.gov/confluence/display/cdi/Data+Manageme nt+Working+Group Monthly meeting notes, presentations, sub-teams, membership information
9
DMWG: Presentations Basics of Using Mendeley - Natalie Latysh, USGS EROS Scientific Records Appraisal Process - John Faundeen, USGS Ocean Biodiversity Information System (OBIS) - Phillip Goldstein, University of Colorado-Boulder USGS Professional/Profile Pages and Sharepoint – James Sayer, USGS
10
DMWG: Presentations National Geological and Geophysical Data Preservation Program's Best Practices for Data Preservation Project - Brian Buczkowski, USGS Data Dissemination thru Cloud Computing: the Next Generation of Data.gov - Ray Obuch, USGS Data Dissemination thru Cloud Computing: the Next Generation of Data.gov - Ray Obuch, USGS USGS Survey Manual Policy Development and Status on Policy of Interest to the CDI - Carolyn Reid, USGS Presentation on DataBasin - Denny Grossman, Jim Strittholt, and Brendan Ward, Data Basin
11
DMWG: Monthly Meetings and Presentations Topics covered during CDI DM WG Calls: Charter for group Powell Center Proposal Coordination between Tech Stack Group Development of three sub-teams (Policy (RGE/EDGE), Best Practices, Data Management Workshop/Meeting) Discussion of abilities and specialties among CDI DM working group members Encouraging people to create USGS Professional Pages to highlight data management work and experience FY 12 proposals
12
DMWG Sub-team: Data Policy
13
DMWG Sub-Team: Data Policy Goals Work toward formal incorporation of data management into the Survey Manual Explore opportunities to relate good data management to the Research Grade Evaluation (RGE) and Equipment Development Grade Evaluation (EDGE) Partner with Office of Science Quality and Integrity to: review existing USGS policies on data management help write new policies provide feedback to RGE-EDGE processes
14
DMWG Sub-Team: Data Policy Leads Terry D'Erchia Sally Holl Participants Christina Bartlett John Faundeen Robin Fegeas Heather Henkel Viv Hutchison Scott McEwen Carol Reiss Elizabeth Sellers
15
DMWG Data Policy Sub-Team: Survey Manual Chapters New Policy Chapters in Progress, in varying stages of review: Survey Manual Chapter 502.x – Fundamental Science Practices: Metadata for Datasets and Information Products Survey Manual Chapter 502.x - Fundamental Science Practices: Safeguarding Unpublished U.S. Geological Survey Data and Information Survey Manual Chapter.XXX - Release of Computer Databases and Computer Programs
16
DMWG Data Policy Sub-Team: RGE Review Where and how do we enable our scientists to report data management activities in RGE? Actions: Attended peer-review panel (March 2011) Reviewed USGS RGE Enhancement Team charter (2007) Interviewed RGE scientists Reviewed RGE/EDGE Guides/Evaluation Forms Presented to the RGE Panel (August 4, 2011)
17
DMWG Data Policy Sub-Team Questioned 4 RGE scientists about data management practices and needs Short-term Result: Input form created to discover and track data management needs of scientists
18
DMWG Data Policy Sub-Team “Never quite sure a dataset is the most recent. Some datasets don't have an author/creator listed so tracking down if it is the most recent is a challenge.” “Our data are located all over and it is a real effort just to locate data.” “It should be obvious to a researcher how to cite a dataset. Peer-reviewed papers that point to how scientists find out about a dataset are critical.” ******************************************************************** Sub-Team learned from Scientists about their needs and could make recommendations to the RGE Panel based on findings
19
DMWG Data Policy Sub-Team: Recommendations to RGE Panel Make it easier for USGS scientists to do data management (CDI) Incentivize good data management thru RGE (RGE) Make it easier to document data (CDI) Allow easier reporting of data management in RGE– modify self-evaluation documents (RGE) Develop criteria for RGE panel to recognize and reward good data management (CDI-RGE)
20
DMWG Data Policy Sub-Team: Next Steps Continue work with RGE Coordinators on recommendations and on feedback from their input Explore opportunities to include informatics professionals in the RGE/EDGE process Assist in completion of Survey Manual chapters to publication Assist in ‘refresh’ of new scientist Orientation Checklist and Exit Survey Review Survey Manual for relevant data management language
21
DMWG Sub-team: Data Best Practices
22
DMWG Sub-team: Data Best Practices Goals The Best Practices Sub-team was formed in early 2011 to: compile a suite of best practices, lessons learned, and learning opportunities, regarding data management organize this information and make it available through a website or portal
23
Participants John Faundeen (Lead) Brian Buczkowski Tom Burley Jennifer Carlino Robin Fegeas Dave Govoni Heather Henkel Sally Holl Donn Holmes Richard Huffine Viv Hutchison Tim Kern Tim Mancuso Elizabeth Martin Scott McEwen Ellyn Montgomery Cassandra Ladino Daniel Sandhaus Steve Tessler Jessica Thompson Lisa Zolly Joseph Kalfsbeek
24
Participants
25
Work Approach Monthly Webex Sessions March 9 April 6 May 4 June 1 July 6 August 3 Workshop July 26-27 Reston
26
Beginning Steps Step 1: Data Lifecycle Model… Develop/adopt a data lifecycle model that accurately reflects how USGS science data does or should travel through its life. Foundational for Sub-Team Goals: Simplicity, Intuitive, Identify Roles “As the government looks to its plan for open government through the development of tools such as Data.gov, it is important to integrate these tools into the overall federal architecture and project lifecycle.” Harnessing the Power of Digital Data: Taking the Next Step. Scientific Data Management (SDM) for Government Agencies: report from the Workshop to Improve SDM held June 29 – July 1, 2010, Washington, DC.
27
Work Item: Data Life Cycle Model “Literature Search” Compilation Review NSF Workshop USGS Workshop
28
Guidance “The business function that develops and executes plans, policies, practices and projects that acquire, control, protect, deliver and enhance the value of data and information.” Source: DAMA Dictionary of Data Management, 1 st Ed.
29
Draft Data Lifecycle Model PLAN ACQUIRE & PROCESS ANALYZE PRESERVE PUBLISH/ SHARE
30
Output: Plan Business Requirements Data Management Plan Metadata Inception Propose Documentation
31
Output: Acquire & Process Ingest Gather Collection Acquire Discover Evaluate Assemble Get Generate Create Record Monitor Observe Measure QA/QC Appraise Transcribe Organize Process Prepare Integrate QA/QC Normalize Transform Evaluate Transcribe Format Resample Select Sample Organize Package Combine Improve Enhance Structure
32
Output: Analyze Analyze Experiment Interpret Model Test Visualize Appraise Evaluate Review Conclude Deduce Question Normalize Synthesis Discovery Knowledge Transfer Add Value Understand Enhance
33
Output: Preserve Preserve Transform Transcribe Migrate Save Protect Store Archive Manage Replicate Package Curate Transfer Rights Management Control Planning Embargo Rescue Appraise Select Repository Backup Deposit
34
Output: Publish/Share Share Release Submit Knowledge Transfer Prepare Write Author Produce Disseminate Create Distribute Transfer Present Communicate Upload Package Data Deposit Deliver Embargo Web Serve Select Repository Access Produce Share Embargo Release Disseminate Distribute Share Web Serve Select Repository Replicate Submit Discover Access Publish
35
CDI Data Blast Poster “Write-On” Poster
36
DMWG Sub-team: Data Best Practices Next Steps Digest Data Blast Comments Receive CDI Sponsor Feedback Assign Roles to Model Finalize Graphic Establish Science Review FY12 Validation Beginning of Outreach Effort Communicate Final Model to USGS Start Aligning Best Practices to Model Determine Gaps
37
Funded FY11 Data Management Projects
38
Group convened December, 2011 to put together data management proposal to the Powell Center Heather Henkel, Sally Holl, Viv Hutchison, Steve Tessler, Jessica Thompson, Lisa Zolly Proposal not funded, but instead received support from Powell Center to have proposal funded at a higher (enterprise) level Proposal modified and resubmitted June 20th funding received from Core Science Systems (CSS) and CDI Work begun in July
39
Funded Data Management Projects Creation of data management website: Provide one place for best practices, tools, education, key points, recommended reading, checklists Internal initially, plans to expand to external site FY12
40
Funded Data Management Projects Categorization of existing data management materials: Creation of bibliography Content for website Purchase of Enterprise-wide license for DAMA Dictionary of Data Management DAMA Data Management Body of Knowledge
41
Funded Data Management Projects DM training for team: Expose team to same DM background Build upon same core training Intent to provide focused DM training to others, based upon initial training DM Education Products: Educate and encourage data management practices Repurpose existing materials created through DataONE Make available on website DM Planning Tool: Template to guide users through the creation of a DM plan Build upon exiting work done by DataONE and USGS
42
FY12: Proposals from the Working Group
43
Moving Forward Proposals requested from anyone within the CDI-DM working group Initial discussion during monthly telecons 20 proposals submitted Presentation of draft submissions during Tuesday afternoon’s working group session Work done on combining similar proposals, tasks Identification of cross-cutting tasks Creation of slides for this presentation FY12 Proposals
44
Data Management Website (Phase 2) Summary: A critical activity needed for data integration is well-managed data. Enhancement to the Phase 1 data management website will provide USSG researchers with the information they need about how to implement data management practices in their work. Deliverables: Internal (eventually migrating to a public-facing), usability-tested, data management website to underscore the Bureau’s understanding of the importance of data management. USGS researchers have easy access to the standards, tools, and best practices that will ensure adherence to data management. FY12 Proposals
45
Data Management Framework Summary: A critical activity needed for data integration is well-managed data. With a framework for USGS researchers to use to guide planning to preservation of their data, the USGS can offer better access to data ready for integration. Deliverables: Cross-Mission Area, agreed-upon, framework for standardizing data management planning, ultimately resulting in improved access to and integration of research data products. Outreach and training materials will accompany the framework to facilitate communication about the framework to USGS scientists and science managers. FY12 Proposals
46
USGS Science Center Adaptable Data Management Plan Framework Summary: Devise an adaptable baseline framework for science center data management plans: Conduct analysis of existing USGS and external DMPs Address project, data, and business model variations Refine and test proposed framework through implementations at the Alaska (Integrated) Science Center and Texas Water Science Center Deliverables: Publish a wiki version of the DMP framework to enhance future participation and development FY12 Proposals
47
Validate Data Life Cycle Model Summary: Because this model is intended to be the conceptual foundation from which our data management best practices, tools, policies and procedures will emanate, it is vital that it be reviewed extensively… Directly Engage our Scientists & Management Deliverables: Formal Science Review (all Regions & Mission Areas) USGS-Wide Opportunity to Comment Through Data Management Website Town Hall Sessions (Reston & Denver as largest USGS numbers) Communicate Final Model to Bureau (outreach element) FY12 Proposals
48
Data Preservation Mechanisms for USGS Researchers Summary: Identify and provide information to USGS researchers about available data preservation mechanisms they can use and where they can submit their critical data for preservation. Deliverables: Summary reports of potential data preservation mechanisms and what USGS researchers need to participate in data preservation activities. A data preservation webpage in the USGS Data Management website. Identified most feasible data preservation mechanism(s) that USGS researchers can use to preserve their critical data and information on how to participate in those efforts. FY12 Proposals
49
National Vegetation Classification Standard Summary: In this proposal we have identified 3 possible sub- tasks related to the implementation of the National Vegetation Classification Standard (FGDC 2008). The content for the NVC is currently being developed through a variety of FGDC and Ecological Society of America Vegetation Panel. Each of the sub-tasks is a critical component of the full cyber- infrastructure needed to support the standard. Currently prototypes for several of these components exists but they have each been developed independently, and ultimately need to be linked in common framework. FY12 Proposals
50
National Vegetation Classification Standard (cont.) Deliverables: Vegetation classification – interim database design Supports content (community types and descriptions) being developed through grant funding and linking to the NVC website. Vegetation Plot Database - migration plan Provide a centralized database of vegetation plots (currently housed at NCEAS – VegBank) and linkages to existing plot databases in partner agencies (distributed network) Peer Review Infrastructure – workplan A prototype software exists – a data management work flow and document management system is needed. FY12 Proposals
51
Data Mgt – CHA CHA “like” Proposal Summary: A texting, Internet, and Chat based service for rapid response and networking USGS Data Management questions, activities, and support. Deliverables: Network of <10 USGS Data Managers Mechanisms to submit text, e-mail, chat, and web Q&A Integration with USGS Data Management Site Mobile Submission Application Training for CHA CHA Experts Promotion/Outreach/Education Materials 9 Month Evaluation of effectiveness, next steps, etc. Outcomes: Network of USGS Data Managers to support Data Mgt. Development of Architecture & Services for USGS CHA CHA Service. An easy to use, multi-submission method, for rapid response to USGS Data Management questions & issues More effective Data Mgt practices, awareness, & leveraging expertise FY12 Proposals
52
Data Integration Potential from Linking Monitoring Protocols Summary: Efforts to identify, collect, and characterize online monitoring protocol libraries will provide a valuable reference resource to USGS scientists and foster coordinated science and integration opportunities. Deliverables: Centralized access to existing tools that collect documented monitoring protocols through the Data Management website Leverage existing resources, expertise, technology, and content of existing efforts such as Natural Resources Monitoring Partnership, Pacific Northwest Aquatic Monitoring Partnership, and National Environmental Methods Index Common elements identified that will enable interoperability among the systems Leverage the Data Management website as a mechanism for collecting USGS scientist needs for specific protocols and promote additional content into the monitoring library resources FY12 Proposals
53
Quick Response Team to Web-enable Data Summary: High-level (OSTP, NSTC, DOI Secretary, USGS Director, USGS Assistant Director) initiatives require timely response from the agency of relevant data and tools. Mobilizing a team of a metadata creation expert and a Web/map service IT expert will assist scientists to address these data requests in a timely manner, and demonstrate USGS relevance and competency meet the information needs of the Department and higher. Deliverables: Process developed for handling data-release activities that could be transferred to other data management activities under development Undetermined number of datasets broadly available for specific purpose as well as ancillary benefits showcased Leverage development of the GOS to Data.gov migration to develop a process that will be sustainable for future initiatives Leverage thesauri and other metadata standards and existing tools Leverage Document Production process of the Records Management Office FY12 Proposals
54
Develop A Data Standard Process For USGS, Using A TIME Standard As The Pilot Summary: Data Standards are generally lacking across the USGS landscape and the inconsistencies in how we name, describe, and populate various common data elements are impediments to effective data integration. There is currently no process in place on how to establish a data standard. How we name TIME fields and characterize our temporal data is critical to fostering data integration across the enterprise. Also, TIME is not a simple data element as temporal data can represent a full date-time, or only a year, month, day, time interval (range), or a timestamp in a data system, and concepts of ‘valid time’ also need to be considered (the time interval over which a value is valid). Deliverable: Establish a formal process for proposing, evaluating, approving, and implementing a data standard within the USGS. TIME is a ubiquitous data element made up of date and time components, and can serve as the pilot data standard. FY12 Proposals
55
Write A ‘How To…’ Publication On How To Identify And Resolve Issues Involving Non-uniform (Mixed) Time Scales When Integrating Data For Research Use Summary: Facilitate best practices for the integration of data in temporal dimensions. Deliverables/Work: Organize subject matter experts to outline and discuss the problems, existing solutions, and use cases at project and program levels. Prepare a publication on how to identify and resolve these issues in order to use data from various sources and studies for USGS research. FY12 Proposals
56
Write A ‘How To…’ Publication On How To Identify And Resolve Issues Involving Non-uniform (Mixed) Spatial Scales When Integrating Data For Research Use Summary: Facilitate best practices for the integration of data in spatial dimensions. Deliverables/Work: Organize subject matter experts to outline and discuss the problems, existing solutions, and use cases at project and program levels. Prepare a publication on how to identify and resolve these issues in order to use data from various sources and studies for USGS research. FY12 Proposals
57
Survey of USGS Scientists about DM Summary: In order to inform our future actions to assist USGS in the management of its data, a survey of current practices will help us to identify where USGS is performing really well, and where some gaps may exist that we can look to improve. Deliverables: A USGS survey, leveraged from DataONE survey of scientists, with results compiled and analyzed. FY12 Proposals
58
Data Exit Survey for USGS Scientists Summary: To prevent loss of information about data from exiting employees, an exit interview about the data is necessary in our administrative processes. Deliverables: A USGS exit survey/interview, given to existing employees that asks such questions as “Has your data been archived?”, Is the metadata complete?”, “Where is the data located?” FY12 Proposals
59
Thank you! Questions? Comments?
60
Titles of Proposals Data Management Website (Phase 2) Data Management Framework for USGS USGS Science Center Adaptable Data Management Plan Framework Validate Data Life Cycle Model Data Preservation Mechanisms for USGS Researchers National Vegetation Classification Standard Data Mgt – CHA CHA “like” Proposal Data Integration Potential from Linking Monitoring Protocols Quick Response Team to Web-enable Data Develop A Data Standard Process For USGS, Using A TIME Standard As The Pilot Write A ‘How To…’ Publication ---Time Scales Write A ‘How To…’ Publication Spatial Scales Survey of USGS Scientists about DM Data Exit Survey for USGS Scientists
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.