“Reverse Engineering” Statistical Metadata through User Studies Carol A. Hert Syracuse University January 23, 2003.

Slides:



Advertisements
Similar presentations
IB Portfolio Tasks 20% of final grade
Advertisements

Resolving Challenges in Metadata Management: A User-Centered Manifesto Carol A. Hert PNCASIST May 15, 2004.
Cognitive-metacognitive and content-technical aspects of constructivist Internet-based learning environments: a LISREL analysis 指導教授:張菽萱 報告人:沈永祺.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy
Metadata for the SKN: Philosophy, Progress, and Future Directions Sheila Denn, Dan Gillman, Carol Hert, Jung Sun Oh, and Cristina Pattuelli.
Issues in the Transfer of Help Tools to Government Agencies: The Example of the Statistical Interactive Glossary (SIG) Stephanie W. Haas School of Information.
Enabling Discovery, Integration, and Understanding of CJS Information Carol A. Hert University of Washington, Tacoma Sheila O. Denn University of North.
Providing Help with Statistical Concepts and Terms: Enhanced Glossary and Ontology Stephanie W. Haas Ron Brown Cristina Pattuelli.
Open Statistics: Envisioning a Statistical Knowledge Network Ben Shneiderman Founding Director ( ), Human-Computer Interaction.
Joint Information Systems Committee Supporting Higher and Further Education Development of an Information Environment for UK Learning and Teaching NOF-Digitise.
Update and Thoughts on Directions for Metadata Work Carol Hert March 17, 2003.
Everything but the Kitchen Sink: Building a metadata repository for time series data at the Federal Reserve Board San Cannon and Meredith Krug Federal.
The Statistical Knowledge Network: Glossary and Metadata at the EIA Stephanie W. Haas & Sheila O. Denn The GovStat Project NSF.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National.
1 De Philadelphie à Washington ou de l'Union des Etats d'Amérique aux Etats-Unis d'Amérique, en passant par l'État de l'Union: la documentation politique,
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
BUSINESS DRIVEN TECHNOLOGY
Lecture Nine Database Planning, Design, and Administration
Metadata for the SKN: Philosophy, Progress, and Future Directions Sheila Denn, Dan Gillman, Carol Hert, Jung Sun Oh, and Cristina Pattuelli.
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
Consider your target audience and possible publication venues Access the most recent copy of the appropriate style guide and editorial policy and style.
What is Business Analysis Planning & Monitoring?
Nursing Science and the Foundation of Knowledge
Overview of the Database Development Process
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Evaluating Open Educational Resource (OER) Objects Rubric III: Utility of Materials Designed to Support Teaching CC BYCC BY Achieve 2013.
Term 2, 2011 Week 1. CONTENTS Types and purposes of graphic representations Spreadsheet software – Producing graphs from numerical data Mathematical functions.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
INTERNATIONAL SOCIETY FOR TECHNOLOGY IN EDUCATION working together to improve education with technology Using Evidence for Educational Technology Success.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Challenges in adjusting statistical systems to support analysis of climate change Meeting of climate change related statistics for producers and users.
Chapter 1 Database Systems
ITGS Case Study Theatre Booking System Ayushi Pradhan.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
February 17, 1999Open Forum on Metadata Registries 1 Census Corporate Statistical Metadata Registry By Martin V. Appel Daniel W. Gillman Samuel N. Highsmith,
1 Knowledge & Knowledge Management “Knowledge is power” to “Sharing K is power” Yaseen Hayajneh, PhD.
Towards Web Semantics Spreadsheets and the US Government Lee Feigenbaum, Cambridge Semantics Brand Niemann, U.S. EPA SICoP Special Conference February.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
Ohio Technology Standards August 9, 2005 Why Standards in Technology? No Child Left Behind Technology Literacy requirement Computer and Multimedia Literacy.
Question paper 1997.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Victorian Curriculum: Introduction and overview
The Question Bank Graham Hughes & Julie Gibbs Department of Sociology University of Surrey Research Methods Festival, July 2008
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
August 2002BioCoRE 2002 Survey1 D. Brandon, R. Brunner, K. Vandivort and G. Budescu August 2002.
3-1 Modeling Basic Entities DBMS Create Sort Search Addition Deletion Modification Create Sort Search Addition Deletion Modification DBMS is a Software.
Defining the Marketing Research Problem and Developing an Approach
Copyright © 2007, Oracle. All rights reserved. Managing Items and Item Catalogs.
Appendix 2 Automated Tools for Systems Development
Physical Data Model – step-by-step instructions and template
DataNet Collaboration
Modern Systems Analysis and Design Third Edition
Business System Development
Chapter 1 Database Systems
(VIP-EDC) Point 6 of the agenda
2. An overview of SDMX (What is SDMX? Part I)
Nantawan Chuarayapratib Thammasat University
“What Everyone Calls It”
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Chapter 1 Database Systems
Modern Systems Analysis and Design Third Edition
Presentation transcript:

“Reverse Engineering” Statistical Metadata through User Studies Carol A. Hert Syracuse University January 23, 2003

Presentation Overview Defining metadata (yet again) Rationale for user studies—reverse engineering of metadata Two studies of users – Users of statistical tables – Users during statistical integration tasks Implications for system design

Definition of Metadata metadata are information entities preserved in artifacts that perform the task of providing context designed to help the user create, locate, understand, and use* the entities/data to which the metadata refer * help the user manipulate the entity throughout the entity’s lifecycle

The Metadata Challenge What information entities are metadata (and what aren’t)? Which metadata are necessary, essential, optimal for which tasks (and can we acquire them)? How can we understand metadata use and creation to improve our metadata systems (and other tools for user understanding)?

A Viable Approach Reverse engineer metadata elements by investigating how users interact with statistical information and determining what information is necessary to support them

The Viability of User Metadata Studies Plethora of potential metadata Cost of creating or harvesting, maintaining metadata and metadata systems Uncertain utility of some metadata

Rationale for User Studies Examination of users in situ can provide insight into which metadata are used, when, in what formats, etc. Accepted strategy in social informatics, sociology of technology and work

The User Studies Study 1: Metadata needs during usage of statistical tables Study 2: Metadata needs during tasks requiring integration of statistical information Both funded by U.S. National Science Foundation and Bureau of Labor Statistics

Exploring Metadata for Understanding Statistical Tables Task concerned understanding statistical tables Identified user questions/uncertainties about specific tables – Yielding potential metadata elements Searched for answers in existing metadata sources – Investigating potential for harvesting metadata

Exploring Metadata for Understanding Statistical Tables 11 respondents, each worked with 3 tables (mix of electronic and paper) total 170 uncertainties categorized into 5 major categories

Findings about Metadata for Tables Most common questions concerned definitions, followed by rationales Questions related to statistical domain, general table structure, and interface Rationale questions difficult to answer with existing metadata

Types of Uncertainties Definitions (of terms, categories, abbreviations, universe) (97 of 170) Rationales (28 of 170) Table structure (e.g. format, layout, link structure) (24 of 170) Lack of information on – Data collection and sources (4 of 170) – Computational methods (4 of 170) – Comparability/relationship of information (6 of 170) – Others (5 of 170) Other (2 of 170)

Insights about Metadata Metadata often difficult to retrieve (due to unstructured format) Metadata duplicated in multiple places (often manually and with editorial changes) Metadata needed were agency-, table-, or statistics-specific

And a Tension What is the relationship among metadata and other types of information and when and how to these sources interact to support particular tasks? (a.k.a. what are metadata?)

Metadata During Integration Tasks What problems/uncertainties do specific types of users have during tasks involving integration of statistical data? For the same tasks, what problems/uncertainties do experts perceive as being relevant to usage of the data by the user populations? How do problems experienced by end-users compare to those identified by experts? What metadata or other information can be identified to resolve user problems?

Metadata During Integration Tasks Goals of Study – Extend our knowledge of metadata usage – Inform design of tools that incorporate metadata – Consider metadata tools in conjunction with larger set of statistical literacy tools

Metadata During Integration Tasks Methodology – Five tasks requiring integration across sources – Users did 1-2 of the tasks – Think aloud protocols used with follow-up interview – To date, 14 expert users, second round of data collection about to begin

The Tasks 3 variants of “Find 4-6 economic indicators for a particular county and compare the county’s economic status to its state and the United States as a whole” While looking at the economic indicators for Nebraska you notice that the unemployment numbers are not the same at the BLS site and at the Nebraska site— try to determine why.

The Tasks You are interested in building a soybean crushing plant in either Nebraska or South Dakota. Examine natural gas and electricity prices in the states to determine an appropriate location.

The Tasks You have become increasingly concerned about urban sprawl in North Carolina. You are looking for statistics on loss of farming lands and farming income in Orange, Durham, and Wake counties. Has the loss of farmland in these counties been greater than 50% since 1992? How does the loss of farmland and farm income in the Raleigh- Durham area compare to the loss of farmland and farm income across the nation as a whole?

Findings to Date Integrating activities of users – Making comparisons – Noting discrepancies (between data, in presentation approach, etc.) and/or asking what the difference is due to – Manipulations (e.g., mathematical, exporting to spreadsheets) Barriers to integration

More findings Strategies used to find and integrate sources, data, to understand scope of task Knowledge used Types of questions/uncertainties Terminology used Aspects of data that matter to the user during the task

Findings to Date Comparisons are a critical aspect of integration Comparison types identified: – Geographic units – Definitional differences in concepts and variables – Across time – Data from different sources (websites, surveys) – Index value comparisons

Barriers to Successful Integration Definition, source information lacking User lack of knowledge of appropriate strategies (e.g., using time series data, types of calculations to perform) User lack of knowledge about usage of index values, statistical activity purpose and approach Interface design problems (such as scrolling row and column headers)

Further Barriers Inconsistent data across sources Inconsistent interfaces Inability to determine whether data wanted for comparison are available Lack of domain knowledge Lack of knowledge of how to handle inflation, seasonal adjustment Terminology differences

Other Findings Terminological variants within/across agencies and between users and agencies Different approaches suggest different statistics to users Experts use agency and domain knowledge extensively

Using the Results Incorporate specific metadata into a variety of tools – Provide answers from metadata sources for specific presentations, tasks, etc. – Issues are specificity of answer, uniqueness of answer Identifying metadata elements and sources of metadata Determine tools appropriate to a particular user situation

Tools/Approaches under Development Glossary lookup Ontology for cross walking Relationship browser – Enables a person to preview website, datasets by specifying particular relationships (e.g. show me datasets that include unemployment variables and come from surveys of households)

Tools/Approaches Under Development Relationship browser that will modify itself based on the underlying object classes/variables available Embedded help via “sticky notes” Online communities of interest (via communication tools) Tutorials, scenarios of use

Mapping Needs to Tools Definitional information: glossary, mappings of agency terminology to user terminology, ontologies Scoping problem (e.g., what is an economic indicator): example indicators, general definitions Non-linked explanatory information—mouse- overs at point of linkage, additional linkings

Mapping Needs to Tools Managing data collected: access to table builders, word processing, spreadsheets Finding comparable numbers: relationship browser (e.g., geographic, time unit by indicator) Confusion of large number of text links: relationship browser (show me pages/parts of site) that have economic indicators

Integrating Metadata Systems with Other Tools Metadata are one component of a statistical information network – Metadata systems important – Metadata as “organizers, content” of other systems Metadata systems need to pass metadata to other tools and vice versa A New Question: How do our metadata systems and repositories interact with other tools?

Further Information Carol A. Hert The overall project: