The Rise of Informatics as-a Research Domain WIRADA Science Symposium August 2, 2011, Melbourne Peter Fox (RPI and WHOI)

Slides:



Advertisements
Similar presentations
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Advertisements

Introduction to Research Methodology
OASIS Reference Model for Service Oriented Architecture 1.0
Evolving the BCO-DMO search interface - experience with semantic and smart search Cyndy Chandler (WHOI) Peter Fox (RPI and WHOI) Robert Groman, Dicky Allison.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Conceptual Modeling of the Healthcare Ecosystem Eng. Andrei Vasilateanu.
Business Communication Research Class 1 : What is Research? Leena Louhiala-Salminen, Spring 2013.
Medical Informatics Basics
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Scientific Knowledge Discovery in Complex Semantic Networks of Geophysical Systems (no pressure…) EGU2012, NP2.6 April 25, 2012, Vienna, Austria Peter.
PROGRAMMING LANGUAGES The Study of Programming Languages.
Module 3: Business Information Systems Chapter 11: Knowledge Management.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Using DCO Data (Infrastructure, Management, Analysis, Visualization, …) Peter (Marshall Ma) and the Data Science
The RPI semantic development methodology: Use cases as starting points for assessing semantic web technologies that achieve project goals (aka ‘sheesh’)
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
Facilitating Next Generation Science Collaboration: Respecting and Mediating Vocabularies with Semantics in Ecosystems Assessments. December 7, 2011, AGU11.
Human Resource Management Lecture 27 MGT 350. Last Lecture What is change. why do we require change. You have to be comfortable with the change before.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Design Science Method By Temtim Assefa.
Database System Concepts and Architecture
References: [1] [2] [3] Acknowledgments:
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
MIS – 3030 Business Technologies Social Media & Conversation Big Data.
The RPI semantic development methodology: Use cases as starting points for assessing semantic web technologies that achieve project goals February 13,
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Mark Parsons (NSIDC) and Peter Fox (RPI) EGU 2012, GI 1.3 April 23, 2012, Vienna, Austria Exploring Metaphors for making Data Available.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Transparency, applications, and ab- stuff – effect on tools for e-science: it’s all about Informatics June 21, 2010, IATUL 2010 Peter Fox (RPI and WHOI)
Thomson South-Western Wagner & Hollenbeck 5e 1 Chapter Sixteen Critical Thinking And Continuous Learning.
OCM Ontology and Ontology Services August 14, 2012 NOAA, Boulder CO Peter Fox (RPI* and WHOI**) and *Tetherless.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
1 Chapter 1 Introduction to Databases Transparencies.
1ENMA 6010: System Modeling, Simulation, and Analysis - Overview © 2009 – Mark Polczynski All rights reserved.
Semantics and analytics = making the data and the decisions smarter? Digital Antiquity CI Feb 7-8, 2013, Arlington VA Peter Fox (RPI and WHOI)
© 2010 Health Information Management: Concepts, Principles, and Practice Chapter 5: Data and Information Management.
Knowledge Networks and Science Data Ecosystems December 7, 2012, AGU12 IN54A-02. Peter Fox (RPI/ Tetherless World Constellation and WHOI/AOP&E)
Realities in Science Data and Information - Let's go for translucency AGU FM10 IN13B-02 Peter Fox (RPI) Tetherless World.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
1 RDA and Metadata Peter Fox (my view) Metadata session
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
Domain Model A representation of real-world conceptual classes in a problem domain. The core of object-oriented analysis They are NOT software objects.
Experiences Developing a Semantic Representation of Product Quality, Bias, and Uncertainty for a Satellite Data Product Patrick West 1, Gregory Leptoukh.
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
February 19, February 19, 2016February 19, 2016February 19, 2016 Azusa, CA Sheldon X. Liang Ph. D. Software Engineering in CS at APU Azusa Pacific.
KNOWLEDGE MANAGEMENT UNIT II KNOWLEDGE MANAGEMENT AND TECHNOLOGY 1.
NMFS Use Case 1 review/ evaluation and next steps April 19, 2012 Woods Hole, MA Peter Fox (RPI* and WHOI**) and Andrew Maffei (WHOI) *Tetherless World.
Information Model Driven Semantic Framework Architecture and Design for Distributed Data Repositories AGU 2011, IN51D-04 December 9, 2011 Peter Fox (RPI)
Biological and Chemical Oceanography Data Management Office slide 1 of 22 Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological.
Of 24 lecture 11: ontology – mediation, merging & aligning.
The Role of Virtual Observatories and Data Frameworks in an Era of Big Data NIST bIG dATA June 14, 2012, Gaithersburg, MD Peter Fox (RPI and WHOI)
The Semantic eScience Framework AGU FM10 IN22A-02 Deborah McGuinness and Peter Fox (RPI) Tetherless World Constellation.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Bit.ly/2c3XMgd.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Informatics underlying Data Science (ists)
CSc4730/6730 Scientific Visualization
NMFS Use Case 1 review/ evaluation and next steps
Science Data Platforms: Informatics Architectures at the Forefront.
Bird of Feather Session
Presentation transcript:

The Rise of Informatics as-a Research Domain WIRADA Science Symposium August 2, 2011, Melbourne Peter Fox (RPI and WHOI) Tetherless World Constellation

What’s ahead (today) Do you need motivation? –If so - Data Science and Informatics An example Rising = maturity = repeating it – from technology to methodology –Use cases, information models and more … Research topics Where is informatics rising to? 2Tetherless World Constellation

3 Working premise Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data that: appears to be integrated appears to be locally available Data – volume, complexity, mode, scale, heterogeneity, …

4 Mind the Gap! There is/ was still a gap between science and the underlying infrastructure and technology that is available Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.  Informatics - information science includes the science of (data and) information, the practice of information processing, and the engineering of information systems. Informatics studies the structure, behavior, and interactions of natural and artificial systems that store, process and communicate (data and) information. It also develops its own conceptual and theoretical foundations. Since computers, individuals and organizations all process information, informatics has computational, cognitive and social aspects, including study of the social impact of information technologies. Wikipedia.

Data integration and assimilation South Esk Flow Forecast (see talks by: )

Application integration! Smart faceted search Biological and chemical oceanography

Modern informatics enables a new scale-free** framework approach Use cases Stakeholders Distributed authority Access control Ontologies Maintaining Identity

Huh? Scale free? Citation networks, the Web, semantic networks

Use Case … is a collection of possible sequences of interactions between the system under discussion and its actors, relating to a particular goal.

Real use cases: Marine habitat - change Scallop, number, density Scallop, size, shape, color, place Scallop, shell fragment Rock What is this? Flora or fauna? Dirt/ mud; one person’s noise is another person’s signal Several disciplines; biology, geology, chemistry, oceanography Several applications; science, fishing, habitat change, climate and environmental change, data integration Complex inter-relations, questions Use case: What is the temperature and salinity of the water and are these marine specimens usual or part of an ecosystem change? Src: WHOI and the HabCam group

Information Modeling Conceptual Logical Physical 11

Socio-technical system(s) Refers to the joint social and technical aspects of ‘systems’ Sociological – people and groups of people Technical – more than technology but the two are often conflated – of organization and process

Informatics efforts: ‘These members assume well defined roles and status relationships within the context of the virtual group that may be independent of their role and status in the organization employing them’ (Ahuja et al., 1998). Technology Communication Patterns Organizational Structure

Research domain Pulling apart the data/information/ knowledge ecosystem Capturing and representing knowledge –Closed world/ open world Standards – a socio-technical system What, why, how – knowledge provenance ecosystem (yes, another one) Working with multiple information models

Data-Information- Knowledge Ecosystem 15 DataInformationKnowledge ProducersConsumers Context Presentation Organization Integration Conversation Creation Gathering Experience

16 ProducersConsumers Quality Control Fitness for Purpose Fitness for Use Quality Assessment Trustee Trustor Others…

Working with knowledge Expressivity Maintainability/ Extensibility Implementability

Unit of exchange – the triple - example (linked data) Heath (2009) Closed World Open World

Working with knowledge Query Rule execution Inference

Expressivity/ Implementation Declarative Procedural Linked open data URI/http/RDF * Ontology encoded

Ontology Spectrum An ontology specifies a rich description of the Terminology, concepts, nomenclature Properties explicitly defining concepts Relations among concepts (hierarchical and lattice) Rules distinguishing concepts, refining definitions and relations (constraints, restrictions, regular expressions) relevant to a particular domain or area of interest. slide from Kendall/McGuinness SemTech Tutorial

Standards - technical Credit: B. Rouse (BEVO) 2008 Data Systems

The social side Credit: B. Rouse (BEVO) 2008 User Group

What is the ecosystem? Many elements, and they are scattered But these are what enable scientists to explore/ confirm/ deny their research Accountability ProofExplanationJustificationVerifiability ‘Transparency’ -> Translucency Trust ‘Provenance’ Identity

Provenance Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility or who, what, where, why, when… Knowledge provenance; enrich with ontologies and ontology-aware tools Provenance presentation is a challenge

Provenance Distance Computation Based on provenance “distance”, we tell users how different data products are. Issues: Computing the similarity of two provenance traces is non-trivial Factors in provenance have varied weight on how comparable results of processing are Factors in provenance are interdependent in how they affect final results of processing Need to characterize similarity of external (vs. internal) provenance Dimensions/factors that affect comparability is quickly overwhelming Not all of these dimensions are independent - most of them are correlated with each other. Numerical studies comparing datasets can be used, when available, and where applicable to the analysis

Quality, Uncertainty, Bias Quality –Is in the eyes of the beholder – worst case scenario… or a good challenge Uncertainty –has aspects of accuracy (how accurately the real world situation is assessed, it also includes bias) and precision (down to how many digits) Bias has at least two aspects: –Systematic error resulting in the distortion of measurement data caused by prejudice or faulty measurement technique –A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment: Psychological: for example, when data providers audit their own data, they usually have a bias to overstate its quality. Sampling: Sampling procedures that result in a sample that is not truly representative of the population sampled. (Larry English) Semantics – all about meaning in context (see diagram!) Provenance = enabler but knowledge provenance = transformative

Quality Control vs. Quality Assessment Quality Control (QC) flags in the data (assigned by the algorithm) reflect “happiness” of the retrieval algorithm, e.g., all the necessary channels indeed had data, not too many clouds, the algorithm has converged to a solution, etc. (producer) Quality assessment is done by analyzing the data “after the fact” through validation, intercomparison with other measurements, self-consistency, etc. It is presented as bias and uncertainty. It is rather inconsistent and can be found in papers, validation reports all over the place. (consumer)

Information models

Integrating, mediating… At the conceptual level and under an open world assumption Conceptual modeling ontology (McCusker et al. 2011) -> bridging properties to SKOS, IAO,..

Where to? Balancing research and application –Increase emphasis and presence in educational organizations Confront the differences in incentives and inhibitions in different disciplines Further develop peer communities and organizations –Journal impact factors have to go up Explore the shift into open-world semantics and data frameworks

Thanks…