Download presentation
Presentation is loading. Please wait.
Published byEmerald Maude Dennis Modified over 9 years ago
1
GLAST Collaboration Meeting, March 2008 T.Johnson1/22 GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center tonyj@slac.stanford.edu Gamma-ray Large Area Space Telescope http://glast-ground.slac.stanford.edu/
2
GLAST Collaboration Meeting, March 2008 T.Johnson2/22 Outline Topics Covered –xrootd –LAT Data Catalog Features Web Interface Tools –Download Manager –Skimmer –WIRED –Astro Server –Miscellaneous
3
GLAST Collaboration Meeting, March 2008 T.Johnson3/22 xrootd xrootd –System developed at SLAC to manage large datasets –Distributes files across disks Maximizes throughput Minimizes manual disk management Automates archiving datasets to (and restoring from) tape Provides more reliability and scalability than NFS Supports access control based on GLAST collaborator list Has been in used for OpsSim2 and “Big MC Run” –Mostly working smoothly Miscellaneous idiosyncrasies that need to be understood Timeout problems when reading files
4
GLAST Collaboration Meeting, March 2008 T.Johnson4/22 LAT Data Catalog Data catalog is a database designed for tracking LAT datasets –Can be used with Disk files in AFS, NFS, or XROOTD servers, or tape archives Data created inside or outside of processing pipeline Data created/stored at SLAC or elsewhere One or more locations per dataset –Simplifies access to data by providing a uniform view of files irrespective of their physical location –Allows data to be organized into a tree of “virtual” folders Folders don’t have to correspond to physical location of data –Allows data to have associated “meta-data” Some meta-data is required and verified by catalog –size, location, run range, creation date Other meta-data is user-defined and arbitrarily extensible –Data can be Browsed using virtual folders and “groups” –Folders contain arbitrary sub-folders, datasets and groups –Groups contain homogeneous list of datasets Searched using meta-data –E.g. DatasetType=MC && RunMin > 50 && RunMin < 100 –Data crawler As new datasets are registered crawler validates files and extracts meta- data (file size, number of events, etc).
5
GLAST Collaboration Meeting, March 2008 T.Johnson5/22 LAT Data Catalog - Web Interface Browsable tree of datasets Events, file size, run range automatically set by “crawler” Access/ Authentification handled by web Meta-data added by creator Supports mirroring at multiple sites http://glast-ground.slac.stanford.edu/DataCatalog/ Dataset Description
6
GLAST Collaboration Meeting, March 2008 T.Johnson6/22 LAT Data Catalog - Tools Pipeline Tools –From within “Pipeline Scriptlet” datasets can be registered together with meta-data and multiple locations located using meta-data and passed to subsequent processing stages Command Line Tools –Available now registerDataset –Wildcards supported for registering many datasets at once find –List/search for files addLocation addMetadata –Coming soon remove move Java API –Programmatic access to full functionality More Info –Data catalog User’s Guide –http://confluence.slac.stanford.edu/display/ds/Data+Catalog+Users+Guidehttp://confluence.slac.stanford.edu/display/ds/Data+Catalog+Users+Guide
7
GLAST Collaboration Meeting, March 2008 T.Johnson7/22 Recent Improvements Line-mode client find command –datacat find -G merit /MC-Tasks/OpsSim/opssim2-GR-v13r9/runs -s RunMin root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000002-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000003-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000004-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000005-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000006-merit.root –datacat find --recurse --search-groups -F 'DataType=="MERIT"&&nMetStart>=257731200 && nMetStart<=257731202' -S SLAC_XROOT -s TaskName -s Name /MC-Tasks/OpsSim/ root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000001-merit.root –Available now in DEV, feedback encouraged Dan is preparing adding to data catalog user’s guide Enhancements to data catalog access in pipeline –Access meta-data from search results
8
GLAST Collaboration Meeting, March 2008 T.Johnson8/22 Recent Improvements New faster crawler –Original crawler was not able to keep up with MC running at full throttle. –New crawler processes files in parallel and can easily keep up –During Ops Sim2 problems discovered with files >2GB in length Now fixed
9
GLAST Collaboration Meeting, March 2008 T.Johnson9/22 Status/Problems/Plans Problems –Can be painfully slow (with 5,000,000 datasets) New oracle database being tested now Karen working on adding “materialized views” Further optimization of queries needed Sensible pagination of large datasets –Web interface needs to allow selection of data based on Run number range Time range Meta-data search (c.f. line-mode client) –File versions As of Ops Sim 2 L1Proc registers multiple versions of files –r0257998848_v001_merit.rootr0257998848_v001_merit.root –r0257998848_v002_merit.rootr0257998848_v002_merit.root Data catalog does not know these are multiple versions of the same file –Sends them both to the skimmer duplicate events Propose to add versioning to data catalog (show only latest by default) –Need Custom Views of data E.g. All ASP products for run nnn source abc Plan –Fix problems
10
GLAST Collaboration Meeting, March 2008 T.Johnson10/22 Download Manager One-click download of multiple files Inherits authorization from web login –note no anonymous FTP in future – SLAC account will be required for data access Works with ftp:, http: and root: –Validates files (length, checksum) against data catalog Supports simultaneous download of multiple files Does not download files which already exist in target dir –So easy to fetch recently added files Can resume download of partially downloaded files
11
GLAST Collaboration Meeting, March 2008 T.Johnson11/22 Status/Problems/Plans Several problems discovered during Ops Sim 2 –100% CPU usage after file recovery (fixed) –Bad error message if checksum inconsistent (fixed) –Problems downloading files >2GB (almost fixed) New feature –Start/Pause download requested (now available) Feature requests pending –Ability to download select run/time ranges This will work automatically once this feature is added to data catalog web application –Non-GUI version for automated download/sync of data –Ability to select files to download from GUI (without web)
12
GLAST Collaboration Meeting, March 2008 T.Johnson12/22 LAT Data Skimmer Allows data to be selected using “TCut” on tuple columns –Can output either Root or Fits (FT1) files –Uses Pipeline II for data processing Allows parallel processing for large tasks –Output available for download for 10 days –Complete skim history maintained for later reuse
13
GLAST Collaboration Meeting, March 2008 T.Johnson13/22 3 Ways to Access Data Skimmer Directly from Data Portal –http://glast-ground.slac.stanford.edu/DataPortal/http://glast-ground.slac.stanford.edu/DataPortal/ –click on “Simple Skimmer” Data Processing Page(s) From the Data Catalog
14
GLAST Collaboration Meeting, March 2008 T.Johnson14/22 LAT Data Skimmer
15
GLAST Collaboration Meeting, March 2008 T.Johnson15/22 Status/Problems/Plans Problems –Backend/root crashes new (compiled) backend available soon –E-mail notification should include data dir even if failed Need to be able to navigate from pipeline > data dirs Skimmer improvements in progress –Ability to skim more types of files “svac” “cal” and “gcr” added by David Chamont –Web interface needs to catch up –Ability to output more event types Full Recon, Digi, MC trees “Extended Event” (intermediate between FT1 and Merit) Event Lists –CompositeEventLists (CEL) files –Access to more “expert” options
16
GLAST Collaboration Meeting, March 2008 T.Johnson16/22 Event Display (WIRED) WIRED allows quick look at detector response –can be installed directly from Web with no additional GLAST software required. –Uses “HepRep” interchange format/infrastructure (shared with FRED)
17
GLAST Collaboration Meeting, March 2008 T.Johnson17/22 Event Display (WIRED)
18
GLAST Collaboration Meeting, March 2008 T.Johnson18/22 Status/Problems/Plans According to rumour doesn’t work outside my office –Actually it doesn’t work in my office either –But it did work fine for DC2 data Invariant under spatial translations/rotations Now being hooked up to data catalog/xrootd –Issue related to CEL files in gleam being investigated –Should be working again in next few days –“Event Display” link will appear it data catalog Will support browsing events or selection of specific events
19
GLAST Collaboration Meeting, March 2008 T.Johnson19/22 Astro Data Server Similar to skimmer, allows events to be selected using cuts –Cuts can only be on position in the sky, energy, time, and event category –Works much faster than Skimmer –Currently loaded with DC2 data Currently being refurbished for use with Service Challenge data and beyond –Will load all events as soon as they are produced by L1Proc User will be able to select –all data including partial runs –only “complete” runs Loose event cuts CTBClassLevel>1 –User can select CTBClassLevel category Able to output FT1, FT2, Extended event files, Merit root files –API for programmatic event selection Will be used by ASDC tools –Closer integration with data catalog, skimmer
20
GLAST Collaboration Meeting, March 2008 T.Johnson20/22 Astro Data Server Astro data server will remember the last set of parameters you used Astro Server also has a “Favorites” page –Keeps a list of your “favorite” search parameters
21
GLAST Collaboration Meeting, March 2008 T.Johnson21/22 Status/Problems/Plans Was used for SC2 55 day run Not used in Ops Sim 2 Still plan to –Load data from L1Proc –Add programmatic interface for use by ASP/ASDC tools –Better integration with Data Portal Bottom of priority list
22
GLAST Collaboration Meeting, March 2008 T.Johnson22/22 Miscellaneous Data Access Restrictions –Starting very soon (this week hopefully) you will need to be a “glast collaborator” to access files from xrootd –You will need to login to access data catalog/download manager Need to define standard skims –Automate their production Part of RSP? –Automate their registration in data catalog Access to ASP/RSP data has not been discussed here –But is in the plan Feedback from Ops Sim2 has been very useful –Not all digested yet Need more/better documentation –Data Access frequently asked questions http://confluence.slac.stanford.edu/x/zgAz Please suggest more FAQ’s More feedback welcome –http://glast-ground.slac.stanford.edu/DataPortal/http://glast-ground.slac.stanford.edu/DataPortal/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.