Glast Collaboration Data Server and Data Catalog

Slides:



Advertisements
Similar presentations
GLAST Collaboration Meeting, March 2008 T.Johnson1/22 GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center
Advertisements

Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
Need for SOA database for storing SOA data Divya Gade Rejitha Rajasekhar.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
Chapter 9 Chapter 9: Managing Groups, Folders, Files, and Object Security.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Bentley Geospatial Server. Value Proposition The Geospatial Server provides a secured centralized environment to contain the explosion of information.
Collections Management New features in KE EMu 3.1 and beyond.
11.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 11: Introducing WINS, DNS,
Classroom User Training June 29, 2005 Presented by:
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
COMP 410 & Sky.NET May 2 nd, What is COMP 410? Forming an independent company The customer The planning Learning teamwork.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Java Root IO Part of the FreeHEP Java Library Tony Johnson Mark Dönszelmann
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Federated Database Set Up Greg Magsamen ITK478 SIA.
LAT HSK Data Handling from B33 Cleanroom. ISOC Software Architecture.
MySQL spatial indexing for GIS data in a web 2.0 internet application Brian Toone Samford University
The Metadata Tool Custom Metadata Tool Who this tool is for: This tool designed to be used a data management system. This tool is geared more for the.
GLAST Science Support CenterJuly, 2003 LAT Ground Software Workshop Status of the D1 (Event) and D2 (Spacecraft Data) Database Prototypes for DC1 Robert.
AIDA Web Interface Tony Johnson, Victor Serbo, Max Turri AIDA Workshop, CERN, July 2003.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Copyright Theorem Solutions Ltd 2001 Tony Ranger Technical Director Theorem Solutions Ltd. The PDM
Neuroinformatics Working Group Update 10/26/2009 H Jeremy Bockholt.
Lifecycle Server XM Edition. XM Edition Features Full Oracle and SQL Server Support –Oracle & –SQL Server 2005 Improved XML import/export.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Interactive Data Analysis on the “Grid” Tech-X/SLAC/PPDG:CS-11 Balamurali Ananthan David Alexander
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
June 27-29, DC2 Software Workshop - 1 Tom Stephens GSSC Database Programmer GSSC Data Servers for DC2.
System Architecture & Hardware Configurations Dr. D. Bilal IS 582 Spring 2008.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
V7 Foundation Series Vignette Education Services.
GeoServer Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Fermi Fermi (previously GLAST) Gamma-Ray Space Telescope Processing Pipeline and Data CatalogGamma-Ray Space Telescope Processing Pipeline and Data Catalog.
Hydroinformatics Lecture 15: HydroServer and HydroServer Lite The CUAHSI HIS is Supported by NSF Grant# EAR CUAHSI HIS Sharing hydrologic data.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
Simulation Production System
New features in KE EMu 3.1 and beyond
MultiTes 2005 Pro & Web Deployment Kit
Database Replication and Monitoring
Tracker Construction DataBase status
System Architecture & Hardware Configurations
IST 516 Fall 2010 Dongwon Lee, Ph.D. Wonhong Nam, Ph.D.
(on behalf of the POOL team)
Database System Concepts and Architecture
Flanders Marine Institute (VLIZ)
CUAHSI HIS Sharing hydrologic data
System Architecture & Hardware Configurations
“Running Monte Carlo for the Fermi Telescope using the SLAC farm”
Accessing Spatial Information from MaineDOT
Potential use of JAS/JAIDA etc. SAS J2EE Review
VI-SEEM Data Repository
Publishing PVSS data on the Web
LAT Data Server Serve what?
Chapter 10 ADO.
Google Sky.
Performance and Scalability Issues of Multimedia Digital Library
Data Challenge 1 Closeout Lessons Learned Already
Publishing image services in ArcGIS
Presentation transcript:

Glast Collaboration Data Server and Data Catalog Tony Johnson DC2 Planning Meeting June 2005

Contents What Exists Ntuple Pruner/Peeler Data Server (for Internal Collaboration Use) What is Planned What is Wanted?

Data Server Portal Web Portal Provides access to existing data server functionality Currently: NTuple Pruner (Tom Glanzman) Selection of Data via Cuts on Merit Tuple Works with datasets in pipeline data catalog Download of Data via FTP after submission of batch job Allows access to Root Merit Tuple Event Peeler (coming very soon) (Tom Glanzman) Selection of Data via run/event number (uploaded file) Access to Root Merit tuple and/or full Root tuple “Data Server” (Jean-Paul LeFevre) Allows rapid selection of events based on Energy, Origin (decl, ra), Time, Gamma Quality Stored in “meta-data” database Additional MeritTuple cuts Supports adding cuts to personal “favorites” list Currently configured to work with DC1 Root merit tuple only

Screen Shots http://glast-ground.slac.stanford.edu/DataServer/

Data Server Issues Currently trying both Oracle (10g) and MySQL (5) using “spatial” extensions Do not fully support spherical geometry Forced to make rectangular selections rather than circular Problems at poles Performance seems OK – at least for 50 million events Selection performance scales by number of events selected, rather than total events in database Indexes seem very slow to build many hours to add 1,000,000 events – and this seems to scale by total database size Still under investigation, maybe can be improved by tuning Need to decide very soon how much effort to put into database vs. a custom solution

Data Catalog Plans Working on new “Glast Data Catalog” Less tightly coupled to pipeline than current catalog Allows domain specific, user-defined, hierarchical “meta-data” to be associated with each dataset, e.g. Simulation physics, test setup parameters Pointers to pipeline task Pointers to e-logbook entries Web interface will allow browsing data hierarchy or searching based on meta-data Implementation based on earlier “Grid” data catalog developed at SLAC. Uses XML for import/export of data (stored in Oracle XML database)

DC2 Data Catalog?

Data Server Plans Continue to enhance pruner/peeler Add TCut capability to peeler Add access to other data types (SVAC tuple) Data Server Enhance ergonomics of web interface Support search using sky catalog Work on handling larger data volumes Add ability to download events in different formats FITS, run/event #, Different Root tuples Add ability to browse events using event display Use xrootd server to stream data Eliminate waiting for batch job and FTP transfer Experiment with SLAC “Peta-Cache” system Initially use xrootd to serve existing Root tuples Highest performance may require storing tuples in some other format

Data Pump – Streaming data directly to users Data Server TCut Format Converter TCut Format Converter TCut Format Converter Multiple Threads xrootd Root Files

Conclusions Initial data server available Would like some people to try it and give feedback Lots of work to do Need to set goals/priorities for DC2 work Understand timescales Understand what data volume will be Understand what typical queries will be

Hierarchical Data Catalog