E-Science Data Information and Knowledge Transformation BinX An edikt Project Testbed Ted Wen, Robert Carroll, Denise Ecklund, Bob Gibbins, Davy Virdee,

Slides:



Advertisements
Similar presentations
Sugar 2.0 Formal Specification Language D ana F isman 1,2 Cindy Eisner 1 1 IBM Haifa Research Laboratory 1 IBM Haifa Research Laboratory 2 Weizmann Institute.
Advertisements

Polska Infrastruktura Informatycznego Wspomagania Nauki w Europejskiej Przestrzeni Badawczej Institute of Computer Science AGH ACC Cyfronet AGH The PL-Grid.
SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences 1 A framework for.
High Performance Wireless Research and Education Network
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
E-Science Data Information and Knowledge Transformation Thoughts on Education and Training for E-Science Based on edikt project experience Dr. Denise Ecklund.
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
E-Science Data Information and Knowledge Transformation Eldas Building Service Grids with Enterprise Level Data Access Services Alan Gray
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Data formats in e-Science Two key requirements Two key requirements –Interoperability and Scalability –XML is flexible, but verbose –Binary formats are.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
Enterprise Java and Data Services Designing for Broadly Available Grid Data Access Services.
E-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation NeSC Review, 30 September 2003.
Terminologies: An e-Science perspective Nicholas Gibbins Intelligence, Agents, Multimedia University of Southampton.
Language data and XML: archiving and interoperability Simon Musgrave Linguistics Program Monash University
CS 431 The Semester in Elevator Speak Carl Lagoze – Cornell University May 5, 2004.
E-Science Data Information and Knowledge Transformation The BinX Language.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
You Cannot ReSIST Hugh Glaser Electronics & Computer Science University of Southampton DSSE, 28th February 2007.
Looking Forward Mike Goodchild. Where is ESRI going? 9.0 –massively expanded toolbox –script management and metadata –Python, JScript, Perl –visual modeling.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
BinX and Astronomy Bob Mann Institute for Astronomy and National e-Science Centre.
University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna.
New Task Group CRIS Architecture & Development Maximilian Stempfhuber RWTH Aachen University Library
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Addressing the Metadata Bottleneck* *By Developing and Evaluating an Online Tool to Support Non-specialists to Evaluate Dublin Core Metadata Records Michael.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
EdSkyQuery-G Overview Brian Hills, December
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
CODATA 2006 Beijing - E-Science Session The Role of Scientific Data in e-Science: How Do We Preserve All Necessary Data So They are Useful John Rumble.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
E-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation E-Science Centres of Excellence.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Distributed Computing With Triana A Short Course Matthew Shields, Ian Taylor & Ian Wang.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
1 DMS-DQS-SUPSC03-PRE-12-E © DEIMOS Space S.L., 2007 A Semantic Data Grid for Satellite Mission Quality Analysis Reuben Wright Deimos Space.
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sergiu M. Dascalu.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
NARA Report: NARA Persistent Archives Prototype Bill Underwood GTRI, Atlanta CCSDS, MOIMS DAI / IPR WGs Toulouse, 2 Nov-5 Nov 2004.
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sohei Okamoto.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
MSSL Astrogrid Workshop
Control Systems IB Computer Science.
XACML and the Cloud.
GGF10 Workflow Workshop Summary
Presentation transcript:

e-Science Data Information and Knowledge Transformation BinX An edikt Project Testbed Ted Wen, Robert Carroll, Denise Ecklund, Bob Gibbins, Davy Virdee, Rob Baxter

2 Presentation outline Edikt project A data problem BinX - today –language –library –applications BinX – future

3 What is edikt? e-Science Data, Information and Knowledge Transformation –a research development activity designed to bridge the gap between applications science and computer science in the realms of Grid-scale data take prototypes from CS and Grid research… …engineer them into robust tools… …for real application science problems… …test them under extreme science conditions… …and keep an eye on the commercial possibilities Team of 8 professional engineers, mgmt & staff Funded by SHEFC; Project start was May 2002

4 Current activities edikt::Eldas –proving GGFs GDSS for virtual organisations –developing scalable data access technologies edikt::BinX –data interchange for astronomy & PP edikt::Giggle and RLS –evaluation of data replication technology for PP Bioinformatics –data mediation to integrate multiple data sources –data versioning to manage changing schemas

e-Science Data Information and Knowledge Transformation eScience Data Real-World and In Silico Experiments

6 Research and discovery Workflow support tools –Format converter –Model builder Real-world Experiments Data Analysis Result Data Results In silico Experiments Generic Tools App area 2 App area 1 App area 3 App area 4 Existing tools: XML processors New tools: Perl script generators Model description generators C C C Workflow Abstract Model C C

7 Data integration & mediation Distributed Geo-sensors Real-world Experiments Data Integrator/ Mediator Integrated Data Public Biochemical Signalling DBs S1 S2 S3 S4S5 S6 D1 D2 D3 Reaction 1 D1 D2 D3Reaction 2 D1 D2 D3Reaction n D1 D2 D3 –One sensor type with overlapping observation regions –Resolve conflicting values in the overlap –Compute total space – min or max? If max, define missing values –Match the input records –Build integrated records –Detect data value conflicts –Resolve data value conflicts

8 Data subsets Legacy data was not organized for the new analysis –Extract a data subset –Define the subset by queries Real-world Experiments 1953 Legacy Data Real-world Experiments today New Data Analysis New Analysis Data New Results Results Structural metadata query: What is the minimum geo-space data coverage? Simple semantic query: What reactions require 2 or more inhibitor agents to prevent the reaction? Complex semantic query: What objects are contained in a 3-dimensional image? S C

9 BinX for binary data BinX is a foundation tool for these problems when the data is a structured binary file. Workflow – format conversion BinX XML1 Binary data1 BinX XML2 Binary data2 BinX-based format conversion Data SubsetsData Integration BinX XML description R-W Exper Binary data Exp1 Exp2 Exp3 Binary data Integrate dBinary data D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 I-D S1 S2 S3