Presentation is loading. Please wait.

Presentation is loading. Please wait.

NEESgrid Data Effort Jean-Pierre Bardet, Amr Elnashai, Charles Severance, Joe Futrelle.

Similar presentations


Presentation on theme: "NEESgrid Data Effort Jean-Pierre Bardet, Amr Elnashai, Charles Severance, Joe Futrelle."— Presentation transcript:

1 NEESgrid Data Effort Jean-Pierre Bardet, Amr Elnashai, Charles Severance, Joe Futrelle

2 Goals n Data is online and persistent n Data and Metadata are supported together n Data migrates transparently including security, and metadata n Data is completely secure with access controls but security does not get in the way n Data provenance - how was it gathered, how has it been manipulated? n Data in support of research publication n Support for repeatable experiments n Data oriented research computation support n Support for workflow

3 Vision: data on the Grid studies Data Gathering Repository Extracting Mapping Meta Data

4 Objectives of Data Effort in NEESgrid through 09/04 n Develop deploy and extend the NEESgrid toolset n Educate community on the tools and how to use the tools n Work with sites to help in the adoption and localization of the NEESgrid toolset n Work with the community to define and implement a basic NEES-wide data model to enable some basic sharing - much of this will be based on adopting existing work.

5 NEESgrid Data – Core Elements n Local Repository n Central Repository n JAVA APIs – Run locally on the same system as a repository or over OGSA Web Services –NEES File Management Services –NEES Meta Data Services n Data Viewers –Streaming (numeric, X/Y graph) –Stored (X/Y graph, 2-D structure, video)

6 NEESdata NEESpop Local Repository Core Elements API Central Repository Data Teamlets Data Acquisition Workstation API Data Teamlets API Data/MD Ingest Tools Data tools Data viewers Grid and Web Services

7 NEESgrid Data – Technologies n Grid –GRIDFTP is used for data transport –Grid Web Services are used to insure security and provide access control between systems over the Internet – also provide for credential passthrough –Grid credentials are used as part of login providing a single sign-on framework n CHEF –Provides a flexible mechanism for deploying GUI tools like the data viewers and data browsers.

8 A Simple Experimental Scenario DAQ System Glue Test Specimen Labview Developer System Researcher System

9 Simulation System Code Simulation System Code A Simulation Scenario Developer System Simulation System Code

10 The MOST Scenario n Part of the run-up to NEESPop 2.0 –Used Beta of NEESpop and Beta of CHEF –Tested the data ingestion –Tested the metadata capabilities –Developed sample metadata –Tested mapping capabilities n System still available at https://cee-nees.cee.uiuc.edu/chef/

11 NCSA NEESPop (1.1) Colorado NEESPop (1.1) Incoming FTP NEESMost (Win XP) UIUC/Newmark NEESPop (2.0) LabView DAQ MatLab Host And Real-Time Target Control System Sim Controller CO LabView DAQ NSDS UIUC Test Specimen Matlab Computational Model Shore-Western Test Specimen Incoming FTP NSDS Repository CO NCSA UIUC Meta Series of files Complete file (aggregated) NTCP Site / Location ComputerProcess NCSA Ingest NTCP Matlab NTCP Ftp NTCP Wires NFMS/NMDS NSDS File I/O Plug In Ingest UIUC MOST Data Flows

12 Oregon State: Experiment Based Deployment looking at Synchronized Video

13 Experiment Setup n An LED was added to be visible in the video frame which was connected to a button which would signal the “start of the experiment” (1) n This signal was also patched into the DAQ (channel 15) (2) n A person was stationed at the DAQ and at the video camera to manually start both processes. (3) n Both DAQ and Video capture were manually started about 10 seconds before experiment start. n The experiment was run for about 20 seconds at which time both the video and data acquisition were manually stopped n Done using NEESgrid 2.0 beta without any changes DAQ 2 1 3 3

14 NEESpop 2.0 Alpha DAQ 0 2 3 5 3 3 5 3 4 0 3 4 0 6 8 Metadata 0.00 0.01 Time 3636 ch01 4848 ch02 Mpeg video data was moved to a PC using a memory stick and the DAQ data was transferred into Excel. (1) The video data was trimmed using Pinnacle Studio to discard the frames before the trailing edge of the LED signal. (2) The DAQ data was discarded through the trailing edge of the LED signal (3) A time channel (100hz) was added and the data channels were extracted and placed in the NEESpop (4) We put meta data into the NEESpop (neesevent.xml) describing the event and channels (this was done before the experiment was started) Manual processing of the data 1 2 34

15 NEESpop 2.0 Alpha DAQ 0 2 3 5 3 3 5 3 4 0 3 4 0 6 8 Metadata 0.00 0.01 Time 3636 ch01 4848 ch02 The experiment was viewed using the standard NEES stored viewer with synchronized video and data and the ability to move back and forth

16

17 NEES Metadata Effort n NEES Markup Language (NEESML) –Provides an RDF-like structure capable of representing semantic information –XML is the syntax which is used –Logic is more “object oriented” Can define objects Can create objects Can reference objects n Meta data is many different things…. n Goal if we EVER want to build reusable data tools, we have to represent the semantics inside the meta data rather than just the information

18 Which one is Semantic Metadata and Why? phillips phillips second 0.13 second phillips

19 Instant Tutorial: Semantics / RDF This is an XML document. We can do some things with this document like ask the following questions: What string is the “which-beam” attribute of “sensor-info” Does this document meet the syntax requirememts of a DTD (i.e. does it contain “compile”?) That is it. There are some unanswered questions: What does “phillips” mean? What does “second” mean? With enough effort, we could write software which made intelligent choices about the meaning of the elements in each of these two “sub-document types”. Effectively, to build “knowledge” from data, software must evolve which “understands” each new document type. phillips phillips second 0.13 second phillips

20 type: sensor-info man: phillips owner: which-beam: Second width: 45.33 Instant Tutorial: Semantic Information type: Person title:chuck Type: Unit title:second type: nees-experiment time-step: 0.13 pi: This is a semantic (or at least “more” semantic” document. It is best represented by a picture. Instead of thinking of things as strings and having to parse lots of documents, we can learn about seconds and then we can build components which understand “seconds” or “people” and ALWAYS know when to use those components. To “understand” a new “object type”, software must only understand the new elements introduced in that document type. Important: Semantic representation of information is necessary but not sufficient for understanding.

21 Links can encode semantic structure

22 Tools are coded so as to “understand” a particular semantic structure – this becomes “meaning” and something useful

23 The Slide Metadata Data Data Viewers Data Mappers Data Ingestors There is a layer is where we develop tools which take advantage and begin to depend on of the “meaning” of the data – where we begin to depend on the meaning of a second. Where we make a viewer capable of viewing a certain type of object. This is where we build things which make use of knowledge. This layer will never be complete but it is a large focus of the coming months. Concepts Search

24

25 Partial ORST Data Model as Types http://www.nees.org/md/ns/orst-examplehttp://www.nees.org/md/ns/md

26

27 Evolution of Data Technologies SGML HTMLXML RDF/XMLNEESmlCSS/DHTML RDBMS Flat File Data Dictionaries Validation Relations Objects Data Models ConceptsStorage Formats/Representation DataPresentation

28 Relational DBs versus RDF-Style Stores n Both are ways to implement a relation-oriented data model n RDBMS – style repository –DBA “implements” data relationships and tunes important relationships for high performance –Able to handle very large amounts of data with proper tuning –Not flexible – a new relationship requires DBS intervention –Query performance depends on building joins which exploit the hard-coded relationships –Ideal in a high-transaction load environment n RDF – style repository –Relations are not determined a prioiri –Data model can be easily extended by any user as they insert data into the store –Ideal for an archival situation where transaction performance is not critical the ultimate use of data is not fully known in advance –Allows flexibility and change over time – can always put in new models and then develop mappings for –Ideal for a situation where data may need to migrate between repositories

29 RDF/XML Versus NEESML n NEESml is topologically equivalent to RDF but more straightforward to use –A compromise between usability and functionality –Focused on solving the problems of ingesting types and data – rather than “cross-server ontology webs” –Used to build a reference set of ingestion tools –RDF is a moving target n Repository does not store either RDF or NEESML – It is an relational database tuned to store “three-tuples”

30 NEESgrid Data - Value Proposition n An RDF like store – Referential integrity long- term flexibility n Seamless data and meta data transport n Smooth integration of data with meta data n Cool set of extensible tools n Willingness to support efforts to adapt and/or build new tools –Data tools –Metadata tools –Data Viewer (s)

31 (lots of) Directions n In the past 45 days – we have gotten a lot of input on data directions –Form swat teams of interested site to build consensus (Data workshop) –Investigate the ORST model and other models looking for low hanging fruit (NEESgrid summit) –Coordinate with the Consortium data committee (EAB Meeting) n These are all good ideas – we need to do them all – I would prefer that they are one effort for a while n We will discuss plans in the second meeting – I would like “one” direction – at least on the model

32 General Go Forward Plan on Technical Elements n Two words: “Experiment-Based Deployment” –Invest effort where sites are ready to produce results –Core SI team will focus on documenting and hardening the NEESgrid software –EBD team will bring well-understood requirements back to the Core SI team n Liason with other data efforts for best practice –GriPhyn – Physics data effort (Grid based automated storage and workflow) –CMCS – Chemistry collaboration (notebook, automated mapping, and provenance) n Engage sites between releases who have skills that can help – discuss@neesgrid.org

33 Go Forward – Core Elements n Investigate RDF and its relationship to NEESML n Investigate provenance – would like to adopt from another project n Investigate mapping – Would like to adopt from another project n Making notebook information available as metadata

34 Go Forward - Tools n Evaluate the ORST interface and use it to implement experiment-based interface to meta data repository n Extend and improve viewers – publish API so that sites can extend the viewers n Improve notebook –Single signon using CHEF/Grid credentials –Integration with Metadata –Smother integration with CHEF n Explore synchronized video and data capture using DAQ and after-experiment replay of synchronized video and data (ORST UMinn) n Explore the capture of high quality still images as data (UMinn) n Investigate adopting a data-editing tool (XMLSpy)

35 Go Forward – Data “Dictionary” n Analyze the ORST model, determine core, convert to NEESML, pre-populate repositories with types, and develop usage documentation n Form core group between SI, ES, and CS to push data model issues forward – once groundwork is better defined – we can disperse into distributed teams n Use experiment based deployment to help us encounter new data needs over time

36 What is in Release 2.0? October 7 n Groovy look and feel n Local Data Repository n Repository Browser in CHEF –Browse –Create objects –Upload / download data n API documentation n NEESML User Documentation n Extensible data mapping in Java n Data Viewer in CHEF –Improved visually –Configured by XML –Can read data from repository or from urls –Pre-populated with sample video and data formats n Local Repositories pre- populated with –SAC Data –MOST Data –ORST data model (subset)

37 NEESML Table 1: Primitive types in NEESML NameDescriptionExamples string Text “Hello, world.” “BN# 493-2584x” int Integer 3 -2 2147483647 long Long integer. Can exceed the size of an integer. -5782347562427 9223372036854775807 double Double precision floating point number. 523425.4568574636 -0.0000000435234 date A moment in time, represented as a date and time stamp in UTC with 1ms resolution. 2002-10-27 15:40:32.048 1969-01-12 00:03:48.774

38 Repository Browser

39 Ingestor

40 API Documentation

41 Configuring Events in XML <event id="oregon" desc="Oregon Large Tank Test September 8, 2003" host=“/chef/org.nees.repo.data/retrieve-data?lfn=nacse_sample_01.txt& static=yes&mapping=nacse-" type="stored"> <video id="01" desc="Video of cylinder" url=“/chef/retrieve-data/static/nacse_sample_01.avi" /> We may be able to get a patch out to switch this to NEESml and provide a simple entry tool.

42 Mappings and the Data Viewer n NSDS (ISO 8601 Time channel) n Column data with time recorded as a column n Column – generate time n Column – generate time – trigger filter Channel units: g,g,in,kip Time ATL1 ATT1 2002-11-13T15:48:55.26499 -0.006409 0.004272 2002-11-13T15:48:55.36499 -0.005798 -0.003662 100.000 0.435 0.161 -1.016 -0.981 0.430 0.161 -1.016 -0.977 0.435 0.161 -1.016 -0.977 public class NEESDataMap { public static boolean repoMap(File mainFile, File mappingFile, String mapping) { // Code here }

43 Release 2.1 Data Aspects December 2003 n NEESpop –Notebook to metadata repository connection made –Closer integration of notebook into CHEF –First release of experiment tool (based on ORST) –Retool data viewers to be completely driven by Metadata objects rather than their own objects –More fine grained access control –Enhanced data models n Tools –Ingestion tools released –A limited set of pre-release video/image tools

44 Further releases n Release 2.2 – March 04 –Driven by your needs as we encounter them –Perhaps some “nice to haves” from the SI team n Release 3.0 – June 04 –Very limited new functionality – maybe almost nothing new in the core components of the NEESpop

45 DAQ 0 3 4 0 6 8 0 3 4 0 6 8 My Skunkworks Project

46 Detailed Session n Amr Elnashai and JP Bardet will lead n Discussion of Metadata used in MOST and the SAC data sets n Discussion of Cosmos, SensorML, and the ORST model n Discussion of the effort on the ORST data model to date n Discussion of the ORST IT tool and how to get its functionality into NEESgrid n Discussion of the relationship between Consortium DSAC and SI data effort

47 Summary n Once NEESpop 2.0 is released, we will have a powerful set of data tools that the sites could take and use out of the box. n Experiment based deployment says that the SI team will work hand-in-hand with EBD sites to use the tools n This will identify new needs and requirements over the next year –The SI team will extend the NEESpop (2.1, 2.2, and 3.0) within resource constraints n This is really starting to be fun

48 NEESdata Worktools Site n neespop.si.umich.edu


Download ppt "NEESgrid Data Effort Jean-Pierre Bardet, Amr Elnashai, Charles Severance, Joe Futrelle."

Similar presentations


Ads by Google