Research Records and Artifact Ecologies Natasa Milic-Frayling Principal Researcher Microsoft Research Cambridge, UK The Evolving Scholarly Record and the.

Slides:



Advertisements
Similar presentations
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Advertisements

Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Interactive Classroom Goals Overview of the User Experience Demo Applying Lessons from Classroom Presenter Discussion.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
© 2010 Microsoft Corporation. All rights reserved. Quality Assurance: Towards Tools for Characterizing and Comparing Digital Documents Natasa Milic-Frayling.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
SCHOOL OF INFORMATION UNIVERSITY OF MICHIGAN Success Factors for Collaboratories Gary M. Olson Collaboratory for Research on Electronic Work School of.
Conceptual Modeling of the Healthcare Ecosystem Eng. Andrei Vasilateanu.
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
Preserving and Providing Access to Complex Objects PASIG Washington DC, 23 May, 2013 (Speaker Info) Frodo Baggins Ring Bearer FOTR, LLC Natasa Milic-Frayling.
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
Enhanced Collaboration and other benefits of Sharepoint Technologies Kern Sutton Business Productivity Group Microsoft Corporation.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Module 3: Business Information Systems Chapter 11: Knowledge Management.
Page 1 TECHNICAL TRAINING AND COMMUNICATION USING INFORMATIC WAYS By Gabriela Măgîrdicean Energetic Technical College of Constanţa.
The International Higher Education University Research Performance Forum April 2013 – Pan Pacific Orchard, Singapore Case Study – 2.00pm – 2.45pm.
Free Open-Source, Open- Platform System for Information Mash-Up and Exploration in Earth Science Tawan Banchuen, Will Smart, Brandon Whitehead, Mark Gahegan,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Database System Concepts and Architecture
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
IASSIST 2008 Collection, Communication, Access and Preservation IASSIST 2008 – session E3 Yesterday, Today and Tomorrow: Data on the Web from Vision to.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Relative importance Area of impact What happened! The New Role: The Widening Scope of Information Systems.
Illustrations and Answers for TDT4252 exam, June
Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,
Building and Recognizing Quality School Systems DISTRICT ACCREDITATION © 2010 AdvancED.
From Objects to Assets: The Fungibility of Knowledge Christopher W. Higgins, Esq.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Current and Potential Uses for GIS in Academic Arctic Research Michael F. Goodchild University of California Santa Barbara.
Microsoft Research Faculty Summit Natasa Milic-Frayling & Vijay Rajagopalan Microsoft Corporation.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
The Library of the Future: Embedded in E-Science Presentation to conference “Women in Science” Alexandria, October 23-24, 2007 Carol A. Mandel Dean, Division.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
The Knowledge Grid Methodology  Concepts, Principles and Practice Hai Zhuge China Knowledge Grid Research Group Chinese Academy of Sciences.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Why to care about research?
Pertemuan 16 Materi : Buku Wajib & Sumber Materi :
Concept Mapping: A Graphical System for Understanding the Relationship between Concepts. ERIC Digest.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
Copyright All right reserved 1 i - LIKE Linked Data enrichment for an e-learning system Networked interactions to create, learn and share knowledge.
NCP Info DAY, Brussels, 23 June 2010 NCP Information Day: ICT WP Call 7 - Objective 1.3 Internet-connected Objects Alain Jaume, Deputy Head of Unit.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Information Architecture & Design Week 9 Schedule - Web Research Papers Due Now - Questions about Metaphors and Icons with Labels - Design 2- the Web -
TECHNICAL WRITING 2013 UNIT 3: DESIGNING FOR CHANGE.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
 The same story, information, etc can be represented in different media  Text, images, sound, moving pictures  All media can be represented digitally.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
”Smart Containers” Charles F. Vardeman II, Da Huo, Michelle Cheatham, James Sweet, and Jaroslaw Nabrzyski
1 2. Knowledge Management. 2  Structuring of knowledge enables effective and efficient problem solving dynamic learning strategic planning decision making.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
The importance of being Connected
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Bird of Feather Session
Presentation transcript:

Research Records and Artifact Ecologies Natasa Milic-Frayling Principal Researcher Microsoft Research Cambridge, UK The Evolving Scholarly Record and the Evolving Stewardship Ecosystem OCLC Workshop, Amsterdam 10 June, 2014

Supporting Scientific Work How to support reuse of scientific data, tools, and resources to facilitate new scientific discoveries?

Research on Scientific Practices (1) Process of scientific discovery and ‘universalizing knowledge’ is an inherently social enterprise Van House, N. A., Butler, M. H., and Schiff, L. R Cooperative knowledge work and practices of trust: sharing environmental planning data sets. In Proc. of CSCW '98. ACM Press (1998), Ways of gathering and validating shared data bind the researchers into distinct communities of practice Birnholtz, J. P., and Bietz, M. J. Data at work: supporting sharing in science and engineering. In Proc. of GROUP '03. ACM Press (2003),

Research on Scientific Practices (2) Gathering and propagation of scientific information Difference between the scientific work conducted in the labs and reports communicated to the scientific community. Data passes through a complex, multi-stage social journey, from the laboratory experiments to the written paper. Latour, B. Science in Action, Harvard University Press, Cambridge MA, Scientific records stands as an intermediary between the raw data and the formal scientific paper More ‘annotation, augmentation, deletion and imposed structure’ are added to raw data, the more data moves towards record. Shankar, K.,Order from chaos: The poetics and pragmatics of scientific recordkeeping. J. Am. Soc. Inf. Sci. Technol. (2007) 58, 10,

Research on Scientific Practices (3) Collaboratories―enable teams of distributed scientists to collaborate on scientific problems using tools for shared data access, data analysis, and communication. Olson et al. studied 10 major collaboratories and see them as ‘a challenge to human organizational practices’. Pre-specifying data sharing rules and having a clear understanding of the common benefits, are essential for the success of a collaboratory. Olson, G. M., Teasley, S., Bietz, M. J., and Cogburn, D. L. Collaboratories to support distributed science: the example of international HIV/AIDS research. In Proc. of SAICSIT ‘02 (2002),

Research on Scientific Practices (4) Ownership of data and sharing Bly [4] shows that scientists can be reluctant to share data for fear of losing their ‘monopoly rent’ on that data. Vertesi and Dourish found that the methods of producing and acquiring data in the scientific collaboration influence the manner in which the data is shared. In collaborative and inter-dependent research, there is sense of group ownership of data. In more independent research, competing for equipment, time, and resources, there is a feeling that data is personally earned and owned by individuals. Bly, S. Special section on collaboratories, Interactions. ACM Press (1998), 5, 3, 31. Vertesi, J. and Dourish, P. The value of data: considering the context of production in data economies. In Proc. of CSCW '11, ACM Press (2011),

Observations Research has dealt with important factors: Technical infrastructure (data repositories, tools) Collaborative practices (sharing rules, adopting tools, etc.) Information artifacts (scientific records including metadata that contextualizes data, lab books, publications). What is the inter-relationship of technologies, practices, and artifacts that emerge as part of the scientific activities.

Approach Adopt the ecology metaphor, inspired by the information ecology, introduced in 1999 by Nardi and O’Day Nardi B. A., and O'Day, V. L. Information ecologies: Using technology with heart. (1999) MIT Press. “Information Ecology is a system of people, practices, values and technologies in a particular local environment”.

Research Objectives Study artifacts ecology of a successful collaborative scientific environment Understand the interdependencies of the technologies, practices, and artifacts within the scientific discovery Identify advantages and drawbacks of the observed technologies and practices Consider enhancements Inform the design of the support required for collaborative scientific work.

SCIENTIFIC DISCOVERY IN THE NANO- TECHNOLOGY LAB user observation study

University NanoPhotonics Research Centre Complex and dynamic research environment Internationally recognized within the highly competitive area Technologically highly advanced

Research in Optical Properties of Materials

Research Environment Electronic Lab Book: HP Tablets and MS OneNote Sophisticated lab environment Software: OneNote Office production tools Igor analysis tool Groove data sharing

Physical vs. Electronic Lab Book Laboratory Notebook, Yale University, , p. 245 (June 19, 1946).

Physical vs. Electronic Lab Book

Observed Practices Work practices optimised for rapid sharing of data and information with the research leader and the group Diverse digital artefact ecology, comprising material samples, data, notes, and summaries Issues: bridge information silos, bridge the gap between individual and collective record keeping. Experiments and data collection Analysis and synthesis Interpretation and validation Lab notebook Summary Shared notebook

Data Collection Lab books (OneNote Notebook)

Distillation―From Notes to Summaries Individual researcher notes (OneNote Notebook) Summary of findings (PowerPoint slide)

Interpretation and Validation Gaining collective insights and establishing common ground

Evolution of Knowledge & Digital Artefacts

Inter-weaving of Digital Artifacts Uncovered complex nature of the artefact ecology Scientific work produces a chain of interrelated and complementary artifacts to enable interpretation of scientific data Artifacts are interrelated Lab notes taken during experiments give context to the data Summarise, from the notes, synthesize intermediary findings During meetings, content from summaries (e.g., images) are embedded into meeting notes. Graphs and images are used and reused from one artefact to another, contextualized in new ways as new interpretations emerge.

What does this all mean? Providing access to data is a pre-requisite but not sufficient to support successful reuse of scientific data. We need to design rich environments that can give rise to artifacts that facilitate interaction and crystalization of experimental data and insights. We need to maintain and share not only the data but the artifact ecology that supports scientific work.

REPRESENTATION OF RESEARCH PROJECTS technology probe

How to Create Overviews of Projects? Linking artefacts Overcome the limitations of physical interaction

Replace piles of papers with iconic and digital representations Enable search and data mining Create conceptual maps for individual topic, project, and researcher, linking relevant artefacts. Enable rich interaction and real time manipulation of maps and objects. Meta Surfacing

Co-design Workshop Representing information and data in shared resource maps

Co-design Workshop Desire for improved information linking Space for viewing, arranging, annotating and creating new links between data sources Collaborative space for making connections between projects.

Co-design Workshop Desire for visual project spaces Enable drill down from presentations and summaries to raw data Support tagging and automatic data collection and association

Visualization Ideas

Support for Linking and Sense Making Key functions Import any information type Enables annotation Enables linking of resources Link back to original file and folder place Platform Microsoft Surface to help enable collaboration Synchronisation between tablet and Surface to support current practices

User Tasks Individual knowledge crystallisation Collaborative knowledge crystallisation Active review Sessions 1,2,3Session 4Session 5

Spatial Chunking of Maps Commercia l work Most recent data Progress Scientific work Separate scientific work High level map Sessions 1, S1

Spatial Chunking and Linking within Maps Blue – the results of experiments on stretched samples. Well understood area. Red – areas of uncertainty. Nano-chasms and sample cross sections are incongruous. Results of diffraction experiment not understood. Solutions needed. Orange. Notes show illustrate the interconnection and dependencies between different areas of the graph. Sessions 2, S2

Project Maps

Learnings: Decoupling information units from documents Participants imported sub-parts of the documents. Extracting content was not fully supported across file types; participants used workarounds such as cut&paste The document file is too course grain for creating project maps.

Learnings: Spatial and explicit linking The participants used space, links, and annotations to express relationships among information items in the map. The semantic regions within the map could be ambiguous to third parties without a digital trace of interaction that led to the map

REFERENCES COMPOSITION COLLECTIONS Information Architecture

REFERENCES COMPOSITION COLLECTIONS Information Architecture

REPRESENTATION OF RESEARCH PROJECTS long term access to digital

FILE DIGITAL CONTENT/ EXPERIENCE APPLICATION Persisted Ephemeral PRESERVATION = Persistence + Connection with the contemporary ecosystem. Persisted part of the digital artefact SOFTWARE – decoder Hardware to process and DISPLAY FILE – digital object DIGITAL ARTEFACT

Paradox: we are concerned about storage, yet Digital is inherently about processing bits, not about storing bits

Symbiosis of Files and Applications Objective of preservation is to ensure that the persisted digital content and applications remain connected with the contemporary computing ecosystem. PRESERVATION = Persistence + Connection with the contemporary ecosystem. FILE DIGITAL CONTENT APPLICATION Persisted Ephemeral

What do you want to keep ‘unchanged’? FILE DIGITAL CONTENT APPLICATION If application is not running in the contemporary environment

What do you want to keep ‘unchanged’? FILE DIGITAL CONTENT APPLICATION If application is not running in the contemporary environment –Migrate files and run with a contemporary software (give up on both the original files and the application)

What do you want to keep ‘unchanged’? FILE DIGITAL CONTENT APPLICATION If application is not running in the contemporary environment –Retain the files and port the application to the new environment (retain content files by give up on the application, at least partially)

What do you want to keep ‘unchanged’? FILE DIGITAL CONTENT APPLICATION If application is not running in the contemporary environment –Create a virtual machine with the old computing stack and run the original files and software. (retain original files and original application; maintain scaffolding)

Sustain and increase the value of digtial through Virtualization of legacy software + Bridging Services Individual computational ‘cells’ for different generations of software stacks Computational Cradles Bridging services: format translators, content extractors, etc. Contemporary Computing Ecosystem VM-Gen1 VM-Gen2 VM-Gen3 VM-Gen4

Connecting Legacy with Contemporary Ecosystem ICT: SOFTWARE AND HARDWARE INNOVATION Contemporary Ecosystem Bridging Technologies and Methods Digital artifact always requires (some software) computation. No need to give up on the original software! Contemporar y Computing Ecosystem VM-Gen1 VM- Gen2 VM- Gen3 VM-Gen4

VIRTUALIZATION OF LEGACY SOFTWARE preserving computation

Virtual Machine with Windows 2000 (left) and Windows XP (right), running on Microsoft Cloud (Azure)

Start menus for Windows 2000 (left) and Windows XP (right),

MS Map Point application running on Windows 2000 (left) and MS Money 2003 running on Windows XP (right),

FORMAT TRANSFORMATION Increasing value of legacy content

Word document shown in Microsoft Word 2.0 (from 1992) Running in the Virtual Machine with Windows XP

Word document in MS Word 2.0 (from 1992) and converted to Open XML format, shown in Office 2007 (right)

Word Perfect document, shown in WordPerfect 5.2 (from 1994) Running in the Virtual Machine with Windows XP

Word Perfect document in WordPerfect 5.2 (from 1994) and converted to Open XML format, shown in Office 2007 (right)

Research results in a complex ecology of digital artefacts  This includes a computing infrastructure, software, and digital artefacts Snapshots of scientific research records can be preserved through virtualization of the artefact ecology  That ensures that all the original artefacts can be accessed. Services around research ecology snapshots can provide added value.  These include curation services, beyond the project descriptions provided by the specialist as part of research practices. Concluding Remarks

Thank you Natasa Milic-Frayling Integrated Systems Microsoft Research Cambridge UK

©2013 Microsoft Corporation. All rights reserved.