The Role of Virtual Observatories and Data Frameworks in an Era of Big Data NIST bIG dATA June 14, 2012, Gaithersburg, MD Peter Fox (RPI and WHOI)

Slides:



Advertisements
Similar presentations
© 2012 Open Grid Forum Simplifying Inter-Clouds October 10, 2012 Hyatt Regency Hotel Chicago, Illinois, USA.
Advertisements

CWE, EC – ESA joint activities on e-collaboration Brussels, 13 April 2005 IST Call 5 Preparatory workshop.
BI Web Intelligence 4.0. Business Challenges Incorrect decisions based on inadequate data Lack of Ad hoc reporting and analysis Delayed decisions.
Peter Griffith and Megan McGroddy 4 th NACP All Investigators Meeting February 3, 2013 Expectations and Opportunities for NACP Investigators to Share and.
Improving Cybersecurity Through Research & Innovation Dr. Steve Purser Head of Technical Competence Department European Network and Information Security.
Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
OASIS Reference Model for Service Oriented Architecture 1.0
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 1 Slide 1 An Introduction to Software Engineering.
Scientific Knowledge Discovery in Complex Semantic Networks of Geophysical Systems (no pressure…) EGU2012, NP2.6 April 25, 2012, Vienna, Austria Peter.
1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Cardea Requirements, Authorization Model, Standards and Approach Globus World Security Workshop January 23, 2004 Rebekah Lepro Metz
Using DCO Data (Infrastructure, Management, Analysis, Visualization, …) Peter (Marshall Ma) and the Data Science
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Virtual Observatory --Architecture and Specifications Chenzhou Cui Chinese Virtual Observatory (China-VO) National Astronomical Observatory of China.
Principles of User Centred Design Howell Istance.
Progress in Open-World, Integrative, Web-based Collaborative Research Platforms Peter Fox and the DCO-DS* Team Tetherless World Constellation.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Data Science and Analytics Curriculum development at Rensselaer (and the Tetherless World Constellation) (Adapted from NRC BigData Education Was April.
Quality Management.  Quality management is becoming increasingly important to the leadership and management of all organisations. I  t is necessary.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
 ByYRpw ByYRpw.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
The Rise of Informatics as-a Research Domain WIRADA Science Symposium August 2, 2011, Melbourne Peter Fox (RPI and WHOI)
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Mark Parsons (NSIDC) and Peter Fox (RPI) EGU 2012, GI 1.3 April 23, 2012, Vienna, Austria Exploring Metaphors for making Data Available.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
South Africa in the global knowledge arena: implications for academic libraries Andrew M. KANIKI Executive Director: Knowledge Management and Strategy.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Transparency, applications, and ab- stuff – effect on tools for e-science: it’s all about Informatics June 21, 2010, IATUL 2010 Peter Fox (RPI and WHOI)
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
Soil and Water Conservation Modeling: MODELING SUMMIT SUMMARY COMMENTS Dennis Ojima Natural Resource Ecology Laboratory COLORADO STATE UNIVERSITY 31 MARCH.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Semantics and analytics = making the data and the decisions smarter? Digital Antiquity CI Feb 7-8, 2013, Arlington VA Peter Fox (RPI and WHOI)
Knowledge Networks and Science Data Ecosystems December 7, 2012, AGU12 IN54A-02. Peter Fox (RPI/ Tetherless World Constellation and WHOI/AOP&E)
Realities in Science Data and Information - Let's go for translucency AGU FM10 IN13B-02 Peter Fox (RPI) Tetherless World.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
1 RDA and Metadata Peter Fox (my view) Metadata session
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
A Reference Model for RDA & Global Data Science Yin ChenWouter Los Cardiff University University of Amsterdam 1.
EGI-InSPIRE RI EGI-InSPIRE Open Science Open Data Open Access Sergio Andreozzi Strategy & Policy Manager, EGI.eu
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Key Evidence Needed for Informed Policy Formulation Understanding the relationship between biodiversity and ecosystem services Assessing the response of.
Information Model Driven Semantic Framework Architecture and Design for Distributed Data Repositories AGU 2011, IN51D-04 December 9, 2011 Peter Fox (RPI)
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
Mexico 2050 Calculator Training Week – th August 2013.
The Semantic eScience Framework AGU FM10 IN22A-02 Deborah McGuinness and Peter Fox (RPI) Tetherless World Constellation.
MIGRATING TO NEW TECHNOLOGY
Grade 6 Outdoor School Program Curriculum Map
EOSC services architecture
Systems Engineering for Mission-Driven Modeling
Introducing Disciplines
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Science Data Platforms: Informatics Architectures at the Forefront.
Bird of Feather Session
Wrap-Up – NSF Site Visit 8 February 2010
Presentation transcript:

The Role of Virtual Observatories and Data Frameworks in an Era of Big Data NIST bIG dATA June 14, 2012, Gaithersburg, MD Peter Fox (RPI and WHOI) Tetherless World Constellation and AOP&Ehttp://tw.rpi.edu

2 Virtual Observatories Working premise Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data that: appears to be integrated appears to be locally available Data intensive – volume, complexity, mode, discipline, scale, heterogeneity..

Technical advances From: C. Borgman, 2008, NSF Cyberlearning Report

Prior to 2005, we built systems Rough definitions –Systems have very well-define entry and exit points. A user tends to know when they are using one. Options for extensions are limited and usually require engineering –Frameworks have many entry and use points. A user often does not know when they are using one. Extension points are part of the design –Modern platforms are built on frameworks Tetherless World Constellation4

5 Early days of VxOs … … VO 1 VO 2 VO 3 DB 2 DB 3 DB n DB 1 ?

6 The Astronomy approach; data-types as a service … … VO App 1 VO App 2 VO App 3 DB 2 DB 3 DB n DB 1  VOTable  Simple Image Access Protocol  Simple Spectrum Access Protocol  Simple Time Access Protocol VO layer Limited interoperability Lightweight semantics Limited meaning, hard coded Limited extensibility

VO Standards Creation – largely technical activity Adoption – largely cultural activity

Means of conduct of research* InductionDeduction Observation Theory PatternTentative hyp.Theory HypothesisObservation Confirmation

Fundamentally though We’ve built capabilities in VOs to support induction or deduction and sometimes both, but does this really enable the breadth of science discoveries we seek in the ERA of bIG dATA? Edges? In-betweens? Discipline mashups? Accidental? …

For real discovery – we need abduction! - a method of logical inference introduced by C. S. Peirce which comes prior to induction and deduction for which the colloquial name is to have a "hunch” Importantly - human intuition is needed in interacting with large-scale data

11 What should a VO do? Circa 2006 Make “standard” scientific research much more efficient. –Even the principal investigator (PI) teams should want to use them. –Must improve on existing services (mission and PI sites, etc.). VOs will not replace these, but will use them in new ways. –Access for young researchers, non-experts, other disciplines, Enable new, global problems to be solved. –Rapidly gain integrated views, e.g. from the solar origin to the terrestrial effects of an event. –Find meaningful data related to any particular observation or model. –(Ultimately) answer “higher-order” queries such as “Show me the data from cases where a large coronal mass ejection observed by the Solar-Orbiting Heliospheric Observatory was also observed in situ.” (science-speak) or “What happens when the Sun disrupts the Earth’s environment” (general public)” Fox, AGU Dec. 2008, Next Generation Virtual Observatories

12 What should a VO do? Circa 2011 Data science research is routine. –Anyone can and does use them. –Resource requisition is transparent, anything on demand. –Are part of the research infrastructure and data providers do not have to work hard to have many uses of their data. –There is a new breed of data publishers. –Support full life cycle of data. Enable new, global problems to be solved. –Services are easily assembled to analyze multiple data streams. –Pose questions without an initial detailed knowledge of data sources or origin. Verification and validation is built in. –Society relevant use of science data and applications such as planning and decision support are easily supported. Fox, AGU Dec. 2008, Next Generation Virtual Observatories

Unintended and non-specialist use (appropriate and inappropriate) Scaling to large numbers of data providers and redefining the role(s) and relations Data publication Crossing discipline boundaries Security, access to resources, policy awareness and enforcement Need to survive ‘user testing’, evaluation Issues for Next Generation (2-3 yrs) Virtual Observatories Fox, AGU Dec. 2008, Next Generation Virtual Observatories

Branding and attribution (where did this data come from and who gets the credit, is it the correct version, is this an authoritative source?) Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …) Role in data quality, acquisition, curation and preservation Issues for Next Generation Virtual Observatories Fox, AGU Dec. 2008, Next Generation Virtual Observatories

Science ecosystems Elements are what enable scientists to explore/ confirm/ deny their research Abduction versus induction and deduction Accountability ProofExplanationJustificationVerifiability ‘Transparency’ -> Translucency Trust ‘Provenance’ Identity Citability Integrateability

Provenance Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility or who, what, where, why, when… Knowledge provenance; enrich with ontologies and ontology-aware tools Provenance presentation is a challenge

VOs and modern curation 17 ProducersConsumers Quality Control Fitness for Purpose Fitness for Use Quality Assessment Trustee Trustor Others…

Skill/ tools?

VOs Architectures – Multi-tiered interoperability used by

VO framework… Tetherless World Constellation20

Discussion Significant opportunities for VOs and data as service approaches to ‘scale’ for big data Focus on delivering ‘products’ allows analytics on the back end, but tools to plug into a framework are lacking Encapsulation is good: hides a lot of inner workings, and bad: opaque, hampers transparency Next generation VOs must accommodate: abduction, transparency, interactivity and retain what they do well!